download progress? #670

deitch · 2020-02-04T16:32:10Z

When doing a pull, is it possible to get download progress? It is fairly straightforward to calculate the entire size from an image by adding up the layers sizes. If it is possible to get download updates, it would be easy to calculate percentage done.

jonjohnsonjr · 2020-02-04T17:56:15Z

I've avoided this because I'm not sure how to implement it without pulling in a ton of dependencies. Doing cross-platform tty detection properly seemed nontrivial as well. I'm also not sure how I'd plumb it through the library and what should be responsible for managing it.

I'm open to a PR if it makes sense, but I wouldn't really know where to start.

If you wanted to implement this without needing to change go-containerregistry, you could probably implement a clever http.RoundTripper that looks for */manifests/* and */blobs/* in the path of http requests/responses to get a signal for what's going on by parsing the manifests and using a TeeReader when downloading blobs.

deitch · 2020-02-04T18:25:17Z

Oh quite. I think that outputting the status should be a client usage problem. But if the download had a way to report it without dealing with ttys and such, it would be easier.

I was thinking something like:

downloaded := make(chan int64)
err := v1tarball.WriteToFileWithUpdates(path, ref, img, downloaded)

And the various write funcs would send periodic "I have downloaded this much cumulatively" to the channel over time. It is up to the consumer to use it or not, and decide if it wants to send the output to a tty or something else.

This at least would provide a reliable basis for it.

deitch · 2020-02-10T17:11:48Z

Any thoughts on this? I had some time to mull it over, and I like the approach. It basically lets the library wash its hands of how it is used, just says, "give me a channel, and I will send updates there."

jonjohnsonjr · 2020-02-10T21:24:25Z

I'm not really sure where this should live.

We could pass something into remote.Image that could tell us progress when downloading layers from the registry, or we could pass something into tarbal.Write... that could tell us progress when writing to the tarball.

It's unclear which is more correct or efficient (especially given the matrix of source -> sink combinations).

If the progress bar is in the reader, we can more efficiently calculate the Total size of a given layer. E.g. with remote, we know we can access the image manifest to get the blob size. With a tarball, we know we can stat the files to determine the file size.

On the other hand, the writer knows if it's going to skip certain layers or not e.g. if the tarball already has some base image layers present.

It seems to me that accurately (and efficiently) calculating progress requires some context from both the source and the sink.

Source	Blob Set	Blob Sizes	Read Progress
`remote`	`Manifest()`	`Manifest()`	`http.Response.Body`
`tarball`	`manifest.json`	`tar.Header`	`tar.Reader`
...

Sink	Blobs Skipped	Write Progress
`remote`	`HEAD .../blobs/...`	`http.Request.Body`
`tarball`	`tar.Header`	`tar.Writer`
...

Docker has a progress package that we might want to use? https://github.com/moby/moby/blob/master/pkg/progress/progress.go

If you want to convey a lot of information like layer granularity, skipped layers, mounted layers (like docker), we probably need to come up with something similar to what they do. If you just want a total percentage (or even just bytes downloading and/or written), we could also do that... but I think we need something that we can pass into every package so that each package can signal some events. It might make sense to standardize on some events, as well. Docker's xfer package seems kind of complex, too: https://github.com/moby/moby/tree/master/distribution/xfer

deitch · 2020-02-11T13:49:01Z

Docker's xfer package seems kind of complex, too

It is, but in the end, it does something similar to what I was suggesting: sends the Download() function a chan to which it can send updates. It uses a chan<- Progress, which type Progress is a nice struct with lots of information about totals and current and everything. I don't know how necessary that is. I think we could aim for simple that has one API call: when I ask for something to be made available (as a tar file, directory, whatever), it usually knows at that point the total bytes to be placed, and it can know progress as it goes along. I don't think we need to get too fancy with bytes per layer and such.

So maybe the channel is not chan int64 but maybe instead chan Progress where

type Progres struct {
    total int64
    downloaded int64
}

And let others deal with it. You don't really know the total until you start to download it, since you might just have a ref to an image index, which only is resolved to a particular image (and therefore its manifest, and therefore its size) once you start to download.

It's unclear which is more correct or efficient (especially given the matrix of source -> sink combinations).

I am unsure of the part that is unsure. I would think that progress only matters when downloading, which means when calling a writer, is it not? When I retrieve a manifest, progress isn't really relevant. It is only when I am writing v1tarball.WriteToFile or legacytarball.Write or v1layout.Write that progress is relevant.

Yes, we need information from the source (i.e. image info) when writing in order to get progress info, but don't we pass that into the writer anyways? E.g. if I want to pull down an image:

// getting image info, manifest, etc.
desc, err = remote.Get(ref, options...)
img, err = desc.Image()
tag, ok := ref.(name.Tag)
switch pullWriteFormat {
case FORMATV1:
        err = v1tarball.WriteToFile(writePath, tag, img)
case FORMATLEGACY:
        w, err := os.Create(writePath)
        defer w.Close()
        err = legacytarball.Write(tag, img, w)
case FORMATLAYOUT:
        ii, err := desc.ImageIndex()
        _, err = layout.Write(writePath, ii)
}

all of the source info was passed in at the writing stage anyways in the case statements.

jonjohnsonjr · 2020-02-12T00:51:44Z

I don't think we need to get too fancy with bytes per layer and such.

It would be nice to be able to represent this information somehow, though...

I would think that progress only matters when downloading,

I'm not sure about this -- we'd want progress when uploading this too (for pushing). It might make some sense to get progress when just converting between formats, but local disk -> disk stuff is probably fast enough that it doesn't matter.

which means when calling a writer, is it not?

The point I was trying to make is that it might be easier to instrument the source of bytes to get progress than in the writer path, but after giving it some thought, I don't think it really matters.

Yes, we need information from the source (i.e. image info) when writing in order to get progress info, but don't we pass that into the writer anyways?

I guess it's reasonable to just track progress on the writer side of things... I'd expect this to look something like a WithProgress option that gets passed to Write functions. I don't have a ton of free time right now to dig into this, but if you can put together a PR I'm happy to review it.

deitch · 2020-02-17T18:03:17Z

Take a look at #675 as a starting point, @jonjohnsonjr . I can expand it as needed to the other formats; I just wanted to get the conversation moved to something tangible.

github-actions · 2020-10-27T01:48:40Z

This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Reopen the issue with /reopen. Mark the issue as
fresh by adding the comment /remove-lifecycle stale.

jonjohnsonjr · 2021-01-05T21:25:36Z

@deitch this got closed because it was stale, but did we get to a happy point where it's ~fixed?

deitch · 2021-01-06T11:32:29Z

I think so. This was only for tarball with #675 but that should be good enough for now. If we come across needing it elsewhere, we can worry about it then.

deitch mentioned this issue Feb 17, 2020

add download progress option for v1.tarball #675

Merged

imjasonh mentioned this issue Feb 21, 2020

Use go-containerregistry to push and pull tektoncd/experimental#468

Merged

afbjorklund mentioned this issue Mar 11, 2020

show progress bar for downloading kic base image kubernetes/minikube#7012

Closed

github-actions bot added the lifecycle/stale label Oct 27, 2020

github-actions bot closed this as completed Nov 27, 2020

preminger mentioned this issue Oct 11, 2023

No progress bars in push/pull flows via go-containerregistry sylabs/singularity#2212

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

download progress? #670

download progress? #670

deitch commented Feb 4, 2020

jonjohnsonjr commented Feb 4, 2020

deitch commented Feb 4, 2020

deitch commented Feb 10, 2020

jonjohnsonjr commented Feb 10, 2020

deitch commented Feb 11, 2020

jonjohnsonjr commented Feb 12, 2020

deitch commented Feb 17, 2020

github-actions bot commented Oct 27, 2020

jonjohnsonjr commented Jan 5, 2021

deitch commented Jan 6, 2021

download progress? #670

download progress? #670

Comments

deitch commented Feb 4, 2020

jonjohnsonjr commented Feb 4, 2020

deitch commented Feb 4, 2020

deitch commented Feb 10, 2020

jonjohnsonjr commented Feb 10, 2020

deitch commented Feb 11, 2020

jonjohnsonjr commented Feb 12, 2020

deitch commented Feb 17, 2020

github-actions bot commented Oct 27, 2020

jonjohnsonjr commented Jan 5, 2021

deitch commented Jan 6, 2021