Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[kube-state-metrics] Failing to pull image error 403 Forbidden - pod in ImagePullBackOff #2421

Closed
gabrielbac opened this issue Sep 2, 2022 · 13 comments
Labels
bug Something isn't working

Comments

@gabrielbac
Copy link

Describe the bug a clear and concise description of what the bug is.

Getting this error starting today. Any idea what could be happening?

Failed to pull image "registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.6.0": rpc error: code = Unknown desc = failed to pull and unpack image "registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.6.0": failed to copy: httpReadSeeker: failed open: unexpected status code https://registry.k8s.io/v2/kube-state-metrics/kube-state-metrics/blobs/sha256:ec6e2d871c544073e0d0a2448b23f98a1aa47b7c60ae9d79ac5d94d92ea45949: 403 Forbidden

What's your helm version?

v3.9.3

What's your kubectl version?

v4.5.4

Which chart?

kube-state-metrics

What's the chart version?

2.5.0

What happened?

cant pull image

What you expected to happen?

No response

How to reproduce it?

No response

Enter the changed values of values.yaml?

No response

Enter the command that you execute and failing/misfunctioning.

The deployment is done via Terraform

Anything else we need to know?

No response

@gabrielbac gabrielbac added the bug Something isn't working label Sep 2, 2022
@jankosecki
Copy link

We have the same issue. Any news on that?

@gabrielbac
Copy link
Author

@jankosecki if this helps I downgraded the chart version to 15.1.0. to make it work. Please keep us posted if this starts working for you.

@ritvikgautam
Copy link

We are also experiencing this issue since last week. We have a multi-region deployment, and this affects us only in AWS's eu-central-1 (Frankfurt) region. In five of our other regions, we do not have this issue.

kube-prometheus-stack installation through helm fails, because kube-state-metrics pod fails to start up with ImagePullBackOff.

Failed to pull image "registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.5.0": rpc error: code = Unknown desc = error pulling image configuration: download failed after attempts=1: error parsing HTTP 403 response body: invalid character '<' looking for beginning of value: "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<Error><Code>AccessDenied</Code><Message>Access Denied</Message><RequestId>//REDACTED//</RequestId><HostId>//REDACTED//</HostId></Error>"

@gabrielbac Do you mean downgrading kube-prometheus-stack chart version to 15.1.0? The latest version is 39.13.3. 15.1.0 looks like a pretty old release. I might be missing something here.

@monotek
Copy link
Member

monotek commented Sep 14, 2022

Works for me:

docker pull registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.6.0
v2.6.0: Pulling from kube-state-metrics/kube-state-metrics
Digest: sha256:bdab4e49d71d272cf944c8612dff5ab1250f0fafdae45c22980286ac0c016032
Status: Image is up to date for registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.6.0
registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.6.0

@ritvikgautam
Copy link

Interesting.

I'm unable to pinpoint what exactly is the issue.

I tried it on an EC2 instance in eu-central-1, and I get the same results. We don't have any firewall for egress as well - it works with the same configuration in other regions.

# docker pull registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.5.0
v2.5.0: Pulling from kube-state-metrics/kube-state-metrics
36698cfa5275: Pulling fs layer
c770874a9c13: Pulling fs layer
error pulling image configuration: error parsing HTTP 403 response body: invalid character '<' looking for beginning of value: "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<Error><Code>AccessDenied</Code><Message>Access Denied</Message><RequestId>//REDACTED//</RequestId><HostId>//REDACTED//</HostId></Error>"

# docker pull registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.6.0
v2.6.0: Pulling from kube-state-metrics/kube-state-metrics
0a602d5f6ca3: Pulling fs layer
68ad17e1eab7: Pulling fs layer
error pulling image configuration: error parsing HTTP 403 response body: invalid character '<' looking for beginning of value: "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<Error><Code>AccessDenied</Code><Message>Access Denied</Message><RequestId>//REDACTED//</RequestId><HostId>//REDACTED//</HostId></Error>"

Could it be that this request is reaching a faulty mirror somehow? (if it even works that way, that is)

Also @monotek , I noticed that the docker pull command didn't actually pull a new image, since it was already present on your system. Status: Image is up to date for registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.6.0 Could you perhaps give it another try to not use local cache?

@monotek
Copy link
Member

monotek commented Sep 14, 2022

docker pull registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.6.0
v2.6.0: Pulling from kube-state-metrics/kube-state-metrics
0a602d5f6ca3: Pull complete 
68ad17e1eab7: Pull complete 
Digest: sha256:bdab4e49d71d272cf944c8612dff5ab1250f0fafdae45c22980286ac0c016032
Status: Downloaded newer image for registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.6.0
registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.6.0

@ritvikgautam
Copy link

Thanks @monotek !

I came across this doc for registry.k8s.io, which explains why we might be seeing this discrepancy.

So in conjunction with CNCF staff we are trying to put together a plan to host copies of images and binaries nearer to where they are used rather than incur cross-cloud costs.

One part of this plan is to setup a proxy OCI service, that can identify where the traffic is coming from and redirect to the nearest image layer/repository. This is why we are setting up a new service using what we call an oci-proxy for everyone to use. This proxy will identify traffic coming from, for example, a certain AWS region, then will setup a HTTP redirect to a source in that AWS region. If we get traffic from GKE/GCP or we don't know where the traffic is coming from, it will still redirect to the current infrastructure (k8s.gcr.io).

It's possible this might be affecting traffic only originating from AWS in eu-central-1 region.

@gabrielbac
Copy link
Author

@ritvikgautam
This is still happening to me in us-east-2!

@gabrielbac
Copy link
Author

Turns out the images are stored in S3. We had a VPC endpoint policy blocking this.

Closing this issue

@amall015
Copy link

amall015 commented Nov 2, 2022

@gabrielbac can you share what you updated your S3 VPC Endpoint policy to to allow access? We are experiencing the same issue but we cannot remove the policy for security reasons.

EDIT: in case anyone else stumbles on this, we were able to get past this by using the following arn in our S3 policy:
arn:aws:s3:::prod-registry-k8s-io*

@powerumc
Copy link

This problems occurred AWS ap-northeast-2 region. (South Korea, Seoul)

The pod status prometheus-kube-state-metrics-6fcf5978bf-pc8c4 0/1 ImagePullBackOff in EKS Cluster.
And EC2 Instance of ap-northeast-2,

docker pull registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.8.0
Error response from daemon: Head "https://asia-northeast2-docker.pkg.dev/v2/k8s-artifacts-prod/images/kube-state-metrics/kube-state-metrics/manifests/v2.8.0": dial tcp: lookup asia-northeast2-docker.pkg.dev on 10.0.0.2:53: no such host

@klausyoum
Copy link

@gabrielbac can you share what you updated your S3 VPC Endpoint policy to to allow access? We are experiencing the same issue but we cannot remove the policy for security reasons.

EDIT: in case anyone else stumbles on this, we were able to get past this by using the following arn in our S3 policy: arn:aws:s3:::prod-registry-k8s-io*

@amall015 Thank you so much. I also added arn:aws:s3:::prod-registry-k8s-io* to my S3 policy then the issue was resolved. it works for me. 😊

@kemilad
Copy link

kemilad commented Apr 10, 2023

@amall015 Thank you very much; I've added this arn (arn:aws:s3:::prod-registry-k8s-io*) to my endpoint policy and the issue was resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

8 participants