-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mitigate dockerhub changes rolling out Nov 1st #19477
Comments
/assign @BenTheElder |
/area images |
If all the images used are here There are only 8 images, it seems easy to move them to gcr.io
|
@aojea yeah we should probably start there, and mirror these images used as part of e2e tests into k8s.gcr.io (FYI @dims @thockin). I would probably put them in the same repo we're putting other e2e test images I took a swag at images pulled by kublets during a run of https://testgrid.k8s.io/sig-release-master-blocking#gce-cos-master-default. It's shell, and I haven't verified whether the specific log line(s) this pulls out represent dockerhub's definition of a pull, or whether it's just the first "real" pull that counts.
|
Here's what kubelet pulls for each node in the k8s-infra-prow-build cluster.
This doesn't catch pulls that are done as part of jobs that run on nodes, still looking... |
Looks like we could stand to mirror golang, node, python and alpine too.
My concern is we probably don't want to end up paying to serve these much more commonly used images to non-kubernetes projects. |
These might be available on mirror.gcr.io https://cloud.google.com/container-registry/docs/pulling-cached-images |
Note that the most popular images are in mirror.gcr.io which our CI nodes and our clusters should be generally using* * not kind, and possibly not docker in docker. the latter is easy to move to this if it's not already. Something like glusterdynamic-provisioner is most likely not in it. |
@spiffxp @claudiubelu Turns out gcloud container images list --repository=mirror.gcr.io/library
NAME
mirror.gcr.io/library/alpine
mirror.gcr.io/library/bash
mirror.gcr.io/library/buildpack-deps
mirror.gcr.io/library/busybox
mirror.gcr.io/library/centos
mirror.gcr.io/library/chronograf
mirror.gcr.io/library/consul
mirror.gcr.io/library/couchdb
mirror.gcr.io/library/debian
mirror.gcr.io/library/docker
mirror.gcr.io/library/elasticsearch
mirror.gcr.io/library/flink
mirror.gcr.io/library/ghost
mirror.gcr.io/library/golang
mirror.gcr.io/library/haproxy
mirror.gcr.io/library/hello-world
mirror.gcr.io/library/httpd
mirror.gcr.io/library/kong
mirror.gcr.io/library/mariadb
mirror.gcr.io/library/matomo
mirror.gcr.io/library/maven
mirror.gcr.io/library/memcached
mirror.gcr.io/library/mongo
mirror.gcr.io/library/mongo-express
mirror.gcr.io/library/mysql
mirror.gcr.io/library/nginx
mirror.gcr.io/library/node
mirror.gcr.io/library/openjdk
mirror.gcr.io/library/percona
mirror.gcr.io/library/perl
mirror.gcr.io/library/php
mirror.gcr.io/library/postgres
mirror.gcr.io/library/python
mirror.gcr.io/library/rabbitmq
mirror.gcr.io/library/redis
mirror.gcr.io/library/ruby
mirror.gcr.io/library/solr
mirror.gcr.io/library/sonarqube
mirror.gcr.io/library/telegraf
mirror.gcr.io/library/traefik
mirror.gcr.io/library/ubuntu
mirror.gcr.io/library/vault
mirror.gcr.io/library/wordpress
mirror.gcr.io/library/zookeeper gcloud container images list-tags mirror.gcr.io/library/busybox
DIGEST TAGS TIMESTAMP
c9249fdf5613 latest 2020-10-14T12:07:34
2ca5e69e244d 2020-09-09T03:38:02
fd4a8673d034 1.31,1.31.1 2020-06-02T23:19:57
dd97a3fe6d72 1.31.0 2019-09-04T21:20:16
e004c2cc521c 1.29 2018-12-26T09:20:43 |
Interesting. But we actually have a new type of issue I didn't think about at the meeting: support for multiple architecture types. And it seems like I'm right:
|
naive question, do you think that those CIs in anothers architectures can hit dockerhub limits? |
Good question, actually. Doing a grep in test-infra/config/jobs, I can see:
Then, there are these boards: https://testgrid.k8s.io/conformance-arm The conformance-* boards have periodic jobs. Not sure if sig-node-* jobs are periodic, but they're all running conformance tests. But from the docker images you've listed, not all of them are commonly used:
That leaves us with 5 images that are being used in most conformance runs: busybox, nginx, nginx-new, httpd, httpd-new. It also depends on how many nodes are in cluster: if there are 2 nodes in the cluster, the images will be pulled twice (we use 2 nodes for Windows test runs, for example). After that, we'd have to take a look at all the image building jobs. There are quite a few of them unfortunately: https://cs.k8s.io/?q=BASEIMAGE&i=nope&files=&repos= . Typically, those are postsubmit jobs and they don't run that often, but I'm suspecting that there's a higher chance for multiple jobs to run at the end of a cycle. We could switch to gcr.io base images? |
These all end up invoking GCB builds which is where the image pulls would happen. I am less concerned about these, mostly because I doubt they run at a volume that would cause any rate limiting even if they all theoretically ran on the same instance. I am more concerned about the long tail of jobs on prow's build cluster causing a specific node to get rate-limited. |
That makes sense, that should be less concerning for us then. Although, GCB could have the same issues since we're not the only ones building images through GCB. I've read the dockerhub FAQ, and I saw this:
This should apply to us too, right? If so, have we applied yet? In any case, we can configure the tests to use the [1] test-infra/kubetest/aksengine.go Line 1257 in 882eb3f
[2]
The FAQ also mentions a pull-through cache mirror registry. However, it doesn't mention how it behaves with manifest lists. My first guess is that it doesn't handle manifest lists, and just pulls / caches the images for the platform it's on. This would mean that if we would try this option, we'd have different mirrors, one for earch architecture type we're currently testing. |
We should not be referencing mirror.gcr.io directly but configuring docker and containerd to do so https://cloud.google.com/container-registry/docs/pulling-cached-images#configure k8s-infra-prow-build nodes have containerd setup to use it:
as does k8s-prow-builds (aka the 'default' build cluster)
it is unclear to me whether this is picked up by any pods that try to explicitly run docker commands |
Clusters stood up using kube-up.sh also have this enabled by default: https://github.com/kubernetes/kubernetes/blob/ededd08ba131b727e60f663bd7217fffaaccd448/cluster/gce/config-default.sh#L163-L164 and while under test: which causes mirror.gcr.io to be set as a registry mirror url here: which is then setup as the registry mirror here: https://github.com/kubernetes/kubernetes/blob/ededd08ba131b727e60f663bd7217fffaaccd448/cluster/gce/gci/configure-helper.sh#L1470-L1475 |
|
Image retention isn't the concern here, but good to know. They're still planning to move forward with pull rate-limits |
@claudiubelu mentioned this issue in the weekly SIG-Windows meeting last week. I work closely with the Docker Hub team and wanted to call out a few things here:
|
Note that the GCE e2e clusters (created during testing) are generally using ephemeral IPs so we have no guarantee that a previous user wasn't performing many pulls. I looked and I think the build clusters at least aren't NATed and the VMs are relatively long-lived. There's also places other than our CI to consider, e.g. third party CI. We may not want to continue using images from dockerhub while mitigating for ourselves if others have to work around it too, long term. (vs. images on e.g. k8s.gcr.io where users are not limited) |
Per https://www.docker.com/blog/expanded-support-for-open-source-software-projects/
Does this mean this issue could not be a problem anymore? |
The OSS projects agreement has some strings attached:
source: I applied (a non-kubernetes project) and got an email |
Huh... thanks @howardjohn
Sounds reasonable?
I think in our case that might require steering's input ...
Seems reasonable.
... versus podman, cri-o, containerd etc.? 😕
Seems unclear. 🤔 |
/me wearing my own cap (not steering) LOL. nice try docker! |
@howardjohn Yeah I am waiting on an answer of whether the benefit we get for this cost is unlimited pulls of other images by our CI. We don't publish to dockerhub, and my read of the benefits was unlimited pulls of our images for other users. @aledbf if every image we happen to pull from dockerhub has applied and been exempted from rate limiting, that may lessen the impact |
Docker has removed the bottom 2 bullets on @howardjohn's post from all public communications, and replaced it with -
I think that this is a better description of the kind of support that we are looking for... @spiffxp , I have your response on my to do list, and will try to get back to you before the end of today. |
We've gotten reports from downstream users who don't use mirroring in their CI that kubernetes e2e tests are occasionally hitting rate limits. Opened kubernetes/kubernetes#97027 to cover moving those images/tests to a community-owned registry |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-contributor-experience at kubernetes/community. |
/remove-lifecycle stale kubernetes/kubernetes#97027 has probably addressed the bulk of this for kubernetes/kubernetes We can enforce that jobs should be using k8s-staging or k8s.gcr.io images Beyond that I think any piecemeal followup for specific sub projects should considered out of scope for this, unless anyone has suggestions |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-contributor-experience at kubernetes/community. |
I think we've only seen very minor issues upstream in Kubernetes, unclear that we need to prioritize anything further here. For downstream concerns, we will finish migrating any images used by e2e.test in kubernetes. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-contributor-experience at kubernetes/community. |
/remove-lifecycle rotten |
@spiffxp: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/milestone v1.21 |
Creating this as a placeholder based on discussion in Slack and yesterday's SIG Testing meeting. I'll sketch in what I remember but to defer to @BenTheElder for a plan
Dockerhub is going to rate-limit pulls starting Nov 1st. See https://www.docker.com/pricing/resource-consumption-updates
Pull limit is:
Ideas:
We will likely need to fan out audit/change jobs to all SIGs / subprojects.
I think to start with we should ensure merge-blocking kubernetes/kubernetes jobs are safe, since they represent significant CI volume
The text was updated successfully, but these errors were encountered: