-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix image size calculation in importer #9115
Fix image size calculation in importer #9115
Conversation
@@ -397,6 +418,7 @@ func importRepositoryFromDocker(ctx gocontext.Context, retriever RepositoryRetri | |||
importDigest.Err = err | |||
continue | |||
} | |||
importDigest.Err = calculateImageSize(ctx, repo, importDigest.Image) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather we calculcate the size only when it's not there. Not all registries we use are hub.docker.com, for that cases (where v1 schema is supported) it's unnecessarily calculates the size.
This could be done as a followup to this issue, we need to unblock the queue, atm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use the pattern we are using above (if err := ...; err != nil { importDigest.Err = err; continue }
) for readability.
This definitely will be broken on v1, which does not have layer sizes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like a more natural fit inside of schema1ToImage
to build a complete Image
object. The additional fanout is less than beautiful, but it would keep the API looking more consistent. schema1ToImage
would return a complete and valid look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make sure the integration test is properly testing v1 sizes against registry.access.redhat.com?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Schema v1 supported by hub.docker.com. Before that we used is not documented structure to calculate the size.
v1Compatibility string
V1Compatibility is the raw V1 compatibility information. This will contain the JSON
object describing the V1 of this image.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we not using the descriptor on the layers reference to calculate
the size? We already have that data in schema on layers
- in
schema2toManifest (which this PR should create) we would be able to
calculate the fake size.
On Wed, Jun 1, 2016 at 1:53 PM, Clayton Coleman [email protected] wrote:
It doesn't matter that it's not documented - it's what is deployed in
production in many places still.On Wed, Jun 1, 2016 at 1:52 PM, Alexey Gladkov [email protected]
wrote:In pkg/image/importer/importer.go
#9115 (comment):@@ -397,6 +418,7 @@ func importRepositoryFromDocker(ctx gocontext.Context, retriever RepositoryRetri
importDigest.Err = err
continue
}
importDigest.Err = calculateImageSize(ctx, repo, importDigest.Image)
Schema v1 supported by hub.docker.com. Before that we used is not
documented structure to calculate the size.v1Compatibility string
V1Compatibility is the raw V1 compatibility information. This will contain the JSON
object describing the V1 of this image.—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/openshift/origin/pull/9115/files/492baa03f23cf45ca229c3501c29ca4bccf546f3#r65409920,
or mute the thread
https://github.com/notifications/unsubscribe/ABG_p0YnP3kHQqbqW47WCSl_-TNMETGtks5qHcbcgaJpZM4IrzC2
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see the need to call stat
On Wed, Jun 1, 2016 at 1:56 PM, Clayton Coleman [email protected] wrote:
Why are we not using the descriptor on the layers reference to calculate
the size? We already have that data in schema onlayers
- in
schema2toManifest (which this PR should create) we would be able to
calculate the fake size.On Wed, Jun 1, 2016 at 1:53 PM, Clayton Coleman [email protected]
wrote:It doesn't matter that it's not documented - it's what is deployed in
production in many places still.On Wed, Jun 1, 2016 at 1:52 PM, Alexey Gladkov [email protected]
wrote:In pkg/image/importer/importer.go
#9115 (comment):@@ -397,6 +418,7 @@ func importRepositoryFromDocker(ctx gocontext.Context, retriever RepositoryRetri
importDigest.Err = err
continue
}
importDigest.Err = calculateImageSize(ctx, repo, importDigest.Image)
Schema v1 supported by hub.docker.com. Before that we used is not
documented structure to calculate the size.v1Compatibility string
V1Compatibility is the raw V1 compatibility information. This will contain the JSON
object describing the V1 of this image.—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/openshift/origin/pull/9115/files/492baa03f23cf45ca229c3501c29ca4bccf546f3#r65409920,
or mute the thread
https://github.com/notifications/unsubscribe/ABG_p0YnP3kHQqbqW47WCSl_-TNMETGtks5qHcbcgaJpZM4IrzC2
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's unnecessarily calculates the size.
@soltysh @smarterclayton I added the check to calculate the size only when it is equal to zero. This would eliminate unnecessary requests if we have all the information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still concerned because each of those stats could fail, resulting in
other failures. We probably don't want to fail import if size can't be
calculated. We probably don't want to add a fan-out of 20 requests on
setup.
I'd prefer to not fix this until we have schema2 support. I thought we
were getting a schema2 response without size.
On Wed, Jun 1, 2016 at 2:22 PM, Alexey Gladkov [email protected]
wrote:
In pkg/image/importer/importer.go
#9115 (comment):@@ -397,6 +418,7 @@ func importRepositoryFromDocker(ctx gocontext.Context, retriever RepositoryRetri
importDigest.Err = err
continue
}
importDigest.Err = calculateImageSize(ctx, repo, importDigest.Image)
it's unnecessarily calculates the size.
@soltysh https://github.com/soltysh @smarterclayton
https://github.com/smarterclayton I added the check to calculate the
size only when it is equal to zero. This would eliminate unnecessary
requests if we have all the information.—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/openshift/origin/pull/9115/files/492baa03f23cf45ca229c3501c29ca4bccf546f3#r65415556,
or mute the thread
https://github.com/notifications/unsubscribe/ABG_p3qVdLRkR2IkXDwpJ7kJ7xgChqOKks5qHc3zgaJpZM4IrzC2
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still concerned because each of those stats could fail, resulting in other failures. We probably don't want to fail import if size can't be calculated. We probably don't want to add a fan-out of 20 requests on setup.
Yes, this is possible. We use NewRetryRepository to make few retries. But fix is necessary for the scheme1.
size := int64(0) | ||
for i := range image.DockerImageLayers { | ||
layer := &image.DockerImageLayers[i] | ||
desc, err := bs.Stat(ctx, digest.Digest(layer.Name)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fanout is unfortunate. TODO
for a better solution or maybe a cache?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comment but I don't think we need the stat - we should already have desc.Size when we get the image metadata in schema1ToImage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
schema1ToImage
is only about image creation. It's not getting digests and do not have anything to calculate the size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the v2 spec size must be filled out in layers. Is that not
the case?
On Wed, Jun 1, 2016 at 2:09 PM, Alexey Gladkov [email protected]
wrote:
In pkg/image/importer/importer.go
#9115 (comment):@@ -282,6 +282,27 @@ func applyErrorToRepository(repository *importRepository, err error) {
}
}+func calculateImageSize(ctx gocontext.Context, repo distribution.Repository, image *api.Image) error {
- bs := repo.Blobs(ctx)
- layerSet := sets.NewString()
- size := int64(0)
- for i := range image.DockerImageLayers {
layer := &image.DockerImageLayers[i]
desc, err := bs.Stat(ctx, digest.Digest(layer.Name))
schema1ToImage is only about image creation. It's not getting digests and
do not have anything to calculate the size.—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/openshift/origin/pull/9115/files/492baa03f23cf45ca229c3501c29ca4bccf546f3#r65413168,
or mute the thread
https://github.com/notifications/unsubscribe/ABG_p8C7_D83OxQ7joXFIweSMM4eSYOUks5qHcrygaJpZM4IrzC2
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Schema v1 don't have the size:
Schema v2 have it:
But right now we talking about schema v1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's acceptable.
Our tests show that we had a regression after upgrading the hub.docker.com. But OK, you know better.
this is going to be less of a problem soon.
It will be less of a problem only then when there will be no schema1 on the servers.
we need to fix right now is the tests
I know that now is broken: TestImageStreamImport
, TestImageStreamImportDockerHub
, but there could be more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know how you can help upgrading the dockerregisrty. My PR is also affected by this issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is acceptable if in the master branch of Origin we explicitly call out that importing schema1 from the DockerHub (that have been down converted from schema2) will not have size set, because we anticipate this problem resolving itself by a) more images on the hub being schema2, b) we want to support schema2 import on the rebase.
I'll open a PR to disable the tests now and we can lower the priority.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll open a PR to disable the tests now and we can lower the priority.
OK. Then I close this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened test change in #9118
On Wed, Jun 1, 2016 at 3:41 PM, Alexey Gladkov [email protected]
wrote:
In pkg/image/importer/importer.go
#9115 (comment):@@ -282,6 +282,27 @@ func applyErrorToRepository(repository *importRepository, err error) {
}
}+func calculateImageSize(ctx gocontext.Context, repo distribution.Repository, image *api.Image) error {
- bs := repo.Blobs(ctx)
- layerSet := sets.NewString()
- size := int64(0)
- for i := range image.DockerImageLayers {
layer := &image.DockerImageLayers[i]
desc, err := bs.Stat(ctx, digest.Digest(layer.Name))
I'll open a PR to disable the tests now and we can lower the priority.
OK. Then I close this PR.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/openshift/origin/pull/9115/files/492baa03f23cf45ca229c3501c29ca4bccf546f3#r65428383,
or mute the thread
https://github.com/notifications/unsubscribe/ABG_p_Cre1dOt6e_KNPE4GhqBex9VKe5ks5qHeB4gaJpZM4IrzC2
.
4ebd85b
to
4bcc443
Compare
4bcc443
to
4e57bf3
Compare
[test] |
@smarterclayton @soltysh I added caching for layer size. |
@@ -474,6 +474,8 @@ func ImageWithMetadata(image *Image) error { | |||
image.DockerImageLayers[i].Name = layer.DockerBlobSum | |||
} | |||
if len(manifest.History) == len(image.DockerImageLayers) { | |||
// This code does not work since hub.docker.com has been updated. | |||
// The V1Compatibility no longer contains Size. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be true for docker hub, but for other repos supporting V1, it's not. I'd drop this comment entirely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is valid for all v2 -> v1 conversions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rephrase it in following way:
"This code does not work for images imported from v2 schema registries or converted from v2 to v1, since schema v2 does not contain size information."
Or something more appropriate, please.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@soltysh Fixed
4e57bf3
to
ce7b037
Compare
The changes itself looks good, but I won't let this in without tests. We're having too many problems with images to let any code in without at least simple unit test. |
@@ -474,6 +474,9 @@ func ImageWithMetadata(image *Image) error { | |||
image.DockerImageLayers[i].Name = layer.DockerBlobSum | |||
} | |||
if len(manifest.History) == len(image.DockerImageLayers) { | |||
// This code does not work for images imported from v2 schema | |||
// registries or converted from v2 to v1, since schema v2 does not | |||
// contain size information. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since schema v2 does not contain size information.
It does, it's just hidden in layer descriptors. This information is lost during conversion to v1 though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So current comment is accurate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So current comment is accurate.
It's not. Because schema v2 contains size information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, the last part is wrong.
ce7b037
to
19fdd11
Compare
@@ -78,17 +82,28 @@ func (i *ImageStreamImporter) contextImageCache(ctx gocontext.Context) map[manif | |||
return cache | |||
} | |||
|
|||
type imageStreamImporterCache struct { | |||
ImageCache map[manifestKey]*api.Image |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have digestToRepositoryCache, why do we need a second cache?
19fdd11
to
add6457
Compare
@@ -50,6 +50,9 @@ type ImageStreamImporter struct { | |||
limiter flowcontrol.RateLimiter | |||
|
|||
digestToRepositoryCache map[gocontext.Context]map[manifestKey]*api.Image | |||
|
|||
// digestToLayerSizeCache maps layer digests to size. | |||
digestToLayerSizeCache map[string]int64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make it thread-safe. And let it be passed to NewImageStreamImporter
so it can be shared among multiple importers invoked by image import controller?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, but I think it should not be part of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're going to share it, then it needs to be an LRU.
} | ||
} | ||
t.Fatalf("unexpected request %s: %#v", r.URL.Path, r) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a high potential for sharing most of this handler's code with the other tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would make sense if there is another such test.
add6457
to
7159082
Compare
After upgrading docker/distibution V1Compatibility no longer contains Size. For this reason, the image size calculation is wrong. A new way to calculate the size repeats the way that dockerregistry calculates the size.
This reverts commit b30f322.
7159082
to
9399f53
Compare
Flake from #9766. |
@@ -50,6 +50,9 @@ type ImageStreamImporter struct { | |||
limiter flowcontrol.RateLimiter | |||
|
|||
digestToRepositoryCache map[gocontext.Context]map[manifestKey]*api.Image |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@smarterclayton Can you explain, why is it necessary to group the entries by context? Does the same scenario apply to the digestToLayerSizeCache
below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because access to an image may only be allowed given the credentials associated with your context. So you cannot retry it. It does not apply to digest to layer size.
LGTM, need @smarterclayton approval and answer @miminar question. Also please link an issue/PR addressing multithreaded access to the cache. |
Created #9779 for a proper caching if this PR neglects it. |
Signed-off-by: Gladkov Alexey <[email protected]>
9399f53
to
26dad6d
Compare
Evaluated for origin test up to 26dad6d |
@smarterclayton It's ready for merge ? |
continuous-integration/openshift-jenkins/test SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/6110/) |
LGTM |
I don't see any objections so merging. [merge] |
Evaluated for origin merge up to 26dad6d |
continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/6110/) (Image: devenv-rhel7_4580) |
After upgrading docker/distibution V1Compatibility no longer contains Size. For
this reason, the image size calculation is wrong. A new way to calculate the
size repeats the way that dockerregistry calculates the size.
Fix #7706
@smarterclayton @pweil- @mfojtik @miminar @Kargakis