New Node creation, pull image from private repo fails "Forbidden" only for first 10-15 minutes of new node creation #3877

cdenneen · 2017-11-16T17:40:48Z

New node spins up with the following:

Failed to pull image "artifactserver.example.com/gitlab/gitlab-runner:v1.11.5": rpc error: code = 2 desc = Error response from daemon: {"message":"unknown: Forbidden"}
Error syncing pod

If I wait 10-15 minutes it eventually works.

However if I login to the node and do a docker pull artifactserver.example.com/gitlab/gitlab-runner:v1.11.5 it pulls down with no issue and then the pod starts within a few seconds upon next retry.

Basically what I'm trying to understand is why the 10-15 minute delay with new nodes pulling that image from private registry. Why when I pull it manually it behaves any different than the pod creation does.

The text was updated successfully, but these errors were encountered:

justinsb · 2017-11-16T17:53:10Z

Is the artifactserver a GCR / ECR server? Where are the credentials stored?

cdenneen · 2017-11-16T19:04:07Z

@justinsb the artifactserver is Artifactory. Which is added as insecureRegistry to the kind: Cluster configuration. Also pull's don't require creds... just push.

mikesplain · 2017-11-16T20:25:12Z

@cdenneen What networking are you using? We ran into something similar with calico #3224.

cdenneen · 2017-11-16T20:36:03Z

@mikesplain that's it!!!! have you found a solution? Haven't seen any traction on #3224 in a while @chrislovecnm might know if this is being handled outside that issue?

cdenneen · 2017-11-16T20:36:55Z

@mikesplain should we switch to using something other than calico?

mikesplain · 2017-11-17T16:54:49Z

@cdenneen Glad to hear it! Well looks like we'll have a path forward soon based on #3224 (comment).

Anyway, my current workaround is a cleanup script that we schedule a cronjob. Give me a few and I'll open source it.

mikesplain · 2017-11-17T17:10:48Z

@cdenneen Take a look at this, I haven't tested this directly, since I run it via a helm chart, but it should help you out:

https://github.com/mikesplain/calico-clean

cdenneen · 2017-11-17T19:00:55Z

@mikesplain Thanks...
does the schedule have to be quoted?

work/capdev-kubernetes » kubectl create -f calico-clean.yaml
error: error converting YAML to JSON: yaml: line 8: did not find expected alphabetic or numeric character
work/capdev-kubernetes » cat -n calico-clean.yaml | grep -A2 -B2 ' 8'
     6	  labels:
     7	    role.kubernetes.io/networking: "1"
     8	spec:
     9	  schedule: */5 * * * *
    10	  concurrencyPolicy: Replace

cdenneen · 2017-11-17T20:02:39Z

@mikesplain

So I got the cronjob installed but I'm not able to find it using kubectl get cronjobs

API server is running with --runtime-config=batch/v2alpha1=true (had to figure that part out)

To load it need to do --validate=false... maybe i'm not waiting long enough.

mikesplain · 2017-11-17T20:17:07Z

@cdenneen it's under the kube-system namespace. kubectl get cronjobs --namespace kube-system

I'm not positive this will solve your issue since you are getting some sort of response... hmm

cdenneen · 2017-11-17T21:39:21Z

Yeah this might not be the same issue...
Issue I'm having is a new node comes up... doesn't have the image for my stateful set.
The imagePull gets an unknown: Forbidden

Failed to pull image "artifactserver.example.com/gitlab/gitlab-runner:v1.11.5": rpc error: code = 2 desc = Error response from daemon: {"message":"unknown: Forbidden"}
Error syncing pod

I know it's not a connectivity and permission issue from the private repo because if I login to the node and do the docker pull artifactserver.example.com/gitlab/gitlab-runner:v1.11.5 it pulls down without issue and once it's complete, usually by the time I refresh the Dashboard or a get po, I can see the pods running.

chrislovecnm · 2017-11-17T21:56:25Z

A workaround is to have a hook that does an image pull, but that is a work around. Can you get us kubelet logs?

I am guessing that this is an upstream issue btw. Anyone else agree / disagree?

cdenneen · 2017-11-17T22:14:34Z

2m          2m           1         runner-434cb7f1-project-103-concurrent-0wpc7m.14f7fc57504b499e   Pod                                 Normal    Scheduled               default-scheduler                       Successfully assigned runner-434cb7f1-project-103-concurrent-0wpc7m to ip-10-240-51-63.ec2.internal
2m          2m           1         runner-434cb7f1-project-103-concurrent-0wpc7m.14f7fc57600fbcc5   Pod                                 Normal    SuccessfulMountVolume   kubelet, ip-10-240-51-63.ec2.internal   MountVolume.SetUp succeeded for volume "repo"
2m          2m           1         runner-434cb7f1-project-103-concurrent-0wpc7m.14f7fc576052d1ed   Pod                                 Normal    SuccessfulMountVolume   kubelet, ip-10-240-51-63.ec2.internal   MountVolume.SetUp succeeded for volume "default-token-7b27b"
2m          2m           1         runner-434cb7f1-project-103-concurrent-0wpc7m.14f7fc577f67ebcd   Pod       spec.containers{build}    Normal    Pulled                  kubelet, ip-10-240-51-63.ec2.internal   Container image "artifactserver.example.com/ruby:2.1.9" already present on machine
2m          2m           1         runner-434cb7f1-project-103-concurrent-0wpc7m.14f7fc5782473e73   Pod       spec.containers{build}    Normal    Created                 kubelet, ip-10-240-51-63.ec2.internal   Created container
2m          2m           1         runner-434cb7f1-project-103-concurrent-0wpc7m.14f7fc5789756acd   Pod       spec.containers{build}    Warning   Failed                  kubelet, ip-10-240-51-63.ec2.internal   Error: failed to start container "build": Error response from daemon: {"message":"invalid header field value \"oci runtime error: container_linux.go:247: starting container process caused \\\"process_linux.go:359: container init caused \\\\\\\"rootfs_linux.go:53: mounting \\\\\\\\\\\\\\\"/var/lib/kubelet/pods/3073c487-cbdd-11e7-9c9c-021e13f74eaa/volumes/kubernetes.io~empty-dir/repo\\\\\\\\\\\\\\\" to rootfs \\\\\\\\\\\\\\\"/var/lib/docker/overlay/658bcf2ee80186f8257b8bbfa6811dd3466723b248f96a8ce89043c174575e5d/merged\\\\\\\\\\\\\\\" at \\\\\\\\\\\\\\\"/var/lib/docker/overlay/658bcf2ee80186f8257b8bbfa6811dd3466723b248f96a8ce89043c174575e5d/merged/core\\\\\\\\\\\\\\\" caused \\\\\\\\\\\\\\\"not a directory\\\\\\\\\\\\\\\"\\\\\\\"\\\"\\n\""}
2m          2m           1         runner-434cb7f1-project-103-concurrent-0wpc7m.14f7fc57898fbcd6   Pod       spec.containers{helper}   Normal    Pulled                  kubelet, ip-10-240-51-63.ec2.internal   Container image "gitlab/gitlab-runner-helper:x86_64-cbfcb5c" already present on machine
2m          2m           1         runner-434cb7f1-project-103-concurrent-0wpc7m.14f7fc578c5037b8   Pod       spec.containers{helper}   Normal    Created                 kubelet, ip-10-240-51-63.ec2.internal   Created container
2m          2m           1         runner-434cb7f1-project-103-concurrent-0wpc7m.14f7fc5790124f02   Pod       spec.containers{helper}   Normal    Started                 kubelet, ip-10-240-51-63.ec2.internal   Started container
2m          2m           1         runner-434cb7f1-project-103-concurrent-0wpc7m.14f7fc579013e078   Pod                                 Warning   FailedSync              kubelet, ip-10-240-51-63.ec2.internal   Error syncing pod
2m          2m           1         runner-434cb7f1-project-103-concurrent-0wpc7m.14f7fc5afe385268   Pod       spec.containers{helper}   Normal    Killing                 kubelet, ip-10-240-51-63.ec2.internal   Killing container with id docker://helper:Need to kill Pod

cdenneen · 2017-11-17T22:19:33Z

Here is the kubelet info from the node:

kubelet.log

cdenneen · 2017-12-05T18:39:18Z

Anyone know how I can add to kops my nodes ig some sort of hook to do the "docker pull" automatically rather than logging in to each of these nodes to get past the delay?

fejta-bot · 2018-05-22T04:37:30Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

cdenneen · 2018-06-22T21:02:56Z

/remove-lifecycle stale

fejta-bot · 2018-09-20T21:28:47Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2018-10-20T21:44:06Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot · 2018-11-19T22:31:51Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

k8s-ci-robot · 2018-11-19T22:31:58Z

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

justinsb added this to the 1.8.0 milestone Nov 26, 2017

justinsb modified the milestones: 1.8.0, 1.9 Feb 21, 2018

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 22, 2018

justinsb modified the milestones: 1.9.0, 1.10 May 26, 2018

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 22, 2018

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 20, 2018

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Oct 20, 2018

k8s-ci-robot closed this as completed Nov 19, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New Node creation, pull image from private repo fails "Forbidden" only for first 10-15 minutes of new node creation #3877

New Node creation, pull image from private repo fails "Forbidden" only for first 10-15 minutes of new node creation #3877

cdenneen commented Nov 16, 2017

justinsb commented Nov 16, 2017

cdenneen commented Nov 16, 2017

mikesplain commented Nov 16, 2017

cdenneen commented Nov 16, 2017

cdenneen commented Nov 16, 2017

mikesplain commented Nov 17, 2017

mikesplain commented Nov 17, 2017

cdenneen commented Nov 17, 2017

cdenneen commented Nov 17, 2017

mikesplain commented Nov 17, 2017

cdenneen commented Nov 17, 2017

chrislovecnm commented Nov 17, 2017

cdenneen commented Nov 17, 2017

cdenneen commented Nov 17, 2017

cdenneen commented Dec 5, 2017

fejta-bot commented May 22, 2018

cdenneen commented Jun 22, 2018

fejta-bot commented Sep 20, 2018

fejta-bot commented Oct 20, 2018

fejta-bot commented Nov 19, 2018

k8s-ci-robot commented Nov 19, 2018

New Node creation, pull image from private repo fails "Forbidden" only for first 10-15 minutes of new node creation #3877

New Node creation, pull image from private repo fails "Forbidden" only for first 10-15 minutes of new node creation #3877

Comments

cdenneen commented Nov 16, 2017

justinsb commented Nov 16, 2017

cdenneen commented Nov 16, 2017

mikesplain commented Nov 16, 2017

cdenneen commented Nov 16, 2017

cdenneen commented Nov 16, 2017

mikesplain commented Nov 17, 2017

mikesplain commented Nov 17, 2017

cdenneen commented Nov 17, 2017

cdenneen commented Nov 17, 2017

mikesplain commented Nov 17, 2017

cdenneen commented Nov 17, 2017

chrislovecnm commented Nov 17, 2017

cdenneen commented Nov 17, 2017

cdenneen commented Nov 17, 2017

cdenneen commented Dec 5, 2017

fejta-bot commented May 22, 2018

cdenneen commented Jun 22, 2018

fejta-bot commented Sep 20, 2018

fejta-bot commented Oct 20, 2018

fejta-bot commented Nov 19, 2018

k8s-ci-robot commented Nov 19, 2018