REQUEST: New Shared Read Only GitHub Token For Jobs #4433

MadhavJivrajani · 2023-09-01T12:22:54Z

Context

There has been an uptick in projects seeing flakiness in their tests running in either the GKE or EKS clusters due to the tests hitting GitHub rate limits.

Issues and discussions around this:

It seems like these jobs pull artifacts/files needed using the GitHub APIs. Its worth noting that cloning repositories itself does not count against the rate limit: https://github.com/orgs/community/discussions/44515.

It has also been reported that this issue of rate limiting has been exacerbated by moving jobs to the EKS cluster: #4165 (comment). This isn't surprising since nodes on EKS clusters have private IPs and the traffic is egress-ing through NAT and Internet gateways because of which the number of IPs hitting GitHub is very low. There is discussion in SIG K8s Infra around assigning public IPs to nodes, similar to how GKE does it: kubernetes/k8s.io#5759, this would not only help GitHub rate limits, but also rate limits observed while pulling from Docker registry.

However, moving to public IPs also might not prove sufficient for the GitHub rate limit issue for pull heavy jobs since some of them have been experiencing this issue even on the GKE cluster (#4165).

It has also been suggested to use ghproxy to get around this GitHub rate limit issue: #4165 (comment). However, the issue with this is implementations of non-prow clients might have to be significantly changed to adapt to ghproxy (#4165 (comment)).

Proposal

This issue hopes to track the discussion and decision around creating a new read only GitHub token that can be shared (similar to how some jobs re-use bot tokens) by projects with read heavy jobs against the GitHub API.

Authenticated requests have a rate limit of 5000 requests/hour/account, which should be a sufficient aggregate limit.

Prior art: kubernetes/k8s.io#4259

/sig k8s-infra testing contributor-experience
/area github-management
/cc @ameukam @xmudrii @kubernetes/owners

The text was updated successfully, but these errors were encountered:

sbueringer · 2023-09-01T12:50:04Z

Thank you very much for opening this issue.

Just slightly more context. In Cluster API we had flaky jobs for the last 1-2 years, it was just at a rate of ~ <5% so we didn't push the GitHub token issue with the highest priority (so I assume more IPs alone wouldn't help, as you wrote).

sbueringer · 2023-09-01T12:51:03Z

If I'm connecting the dots correctly we would provide the token to the ProwJobs via ExternalSecrets. As far as I'm aware ExternalSecrets are not yet available on the EKS clusters (but there is or will be a discussion about that, cc @ameukam)

xmudrii · 2023-09-01T12:59:05Z

If I'm connecting the dots correctly we would provide the token to the ProwJobs via ExternalSecrets. As far as I'm aware ExternalSecrets are not yet available on the EKS clusters (but there is or will be a discussion about that, cc @ameukam)

ExternalSecrets are available in the EKS Prow build cluster. We can source secrets from the AWS Secrets Manager in the Prow AWS account and from the GCP Secrets Manager in the k8s-infra-prow-build account.

xmudrii · 2023-09-01T13:00:40Z

Update: we migrated all nodes in the EKS Prow build cluster to a public subnet, so all nodes have public IP addresses instead of routing all traffic via a NAT Gateway. That should significantly improve the situation, but if you still see increased failure rate due to rate limits, please let us know.

Priyankasaggu11929 · 2023-09-08T11:21:44Z

Just adding here for record -- discussion from k8s slack channel #sig-k8s-infra around Nodes are randomly freezing and failing - https://kubernetes.slack.com/archives/CCK68P2Q2/p1693476605123389

ameukam · 2023-09-14T15:56:09Z

We should probably create a new bot. (k8s-contribex-ci-robot? ) operated by the github admin team to provide tokens requested by the community.

mrbobbytables · 2023-09-14T17:53:17Z

I don't think we need a separate account for this. With the changes made to the eks cluster and switching to authenticated requests we realistically won't have other requests. We'll just have this one option for people to use for authenticated read-only requests.

MadhavJivrajani · 2023-09-15T06:37:41Z

I think we can create a new account if more requests like this come up. @ameukam can we generate a token from the k8s-infra-ci-robot for this? The governance of this token can be under the purview of github-admins if needed. Thoughts?

k8s-triage-robot · 2024-01-28T11:59:51Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2024-02-27T12:51:28Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot · 2024-03-28T13:36:02Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot · 2024-03-28T13:36:07Z

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

AverageMarcus mentioned this issue Sep 1, 2023

prow jobs failing without PACKER_GITHUB_API_TOKEN kubernetes-sigs/image-builder#1258

Closed

nawazkh mentioned this issue Sep 6, 2023

NodeNotReady test flakes on Release-1.3 test jobs kubernetes-sigs/cluster-api#9379

Closed

chrischdi mentioned this issue Dec 19, 2023

Prowjobs fail with Pod got deleted unexpectedly on community aws infrastructure kubernetes-sigs/cluster-api#9901

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 28, 2024

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 27, 2024

k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale Mar 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

REQUEST: New Shared Read Only GitHub Token For Jobs #4433

REQUEST: New Shared Read Only GitHub Token For Jobs #4433

MadhavJivrajani commented Sep 1, 2023 •

edited

Loading

sbueringer commented Sep 1, 2023 •

edited

Loading

sbueringer commented Sep 1, 2023 •

edited

Loading

xmudrii commented Sep 1, 2023

xmudrii commented Sep 1, 2023

Priyankasaggu11929 commented Sep 8, 2023

ameukam commented Sep 14, 2023

mrbobbytables commented Sep 14, 2023

MadhavJivrajani commented Sep 15, 2023

k8s-triage-robot commented Jan 28, 2024

k8s-triage-robot commented Feb 27, 2024

k8s-triage-robot commented Mar 28, 2024

k8s-ci-robot commented Mar 28, 2024

REQUEST: New Shared Read Only GitHub Token For Jobs #4433

REQUEST: New Shared Read Only GitHub Token For Jobs #4433

Comments

MadhavJivrajani commented Sep 1, 2023 • edited Loading

Context

Proposal

sbueringer commented Sep 1, 2023 • edited Loading

sbueringer commented Sep 1, 2023 • edited Loading

xmudrii commented Sep 1, 2023

xmudrii commented Sep 1, 2023

Priyankasaggu11929 commented Sep 8, 2023

ameukam commented Sep 14, 2023

mrbobbytables commented Sep 14, 2023

MadhavJivrajani commented Sep 15, 2023

k8s-triage-robot commented Jan 28, 2024

k8s-triage-robot commented Feb 27, 2024

k8s-triage-robot commented Mar 28, 2024

k8s-ci-robot commented Mar 28, 2024

MadhavJivrajani commented Sep 1, 2023 •

edited

Loading

sbueringer commented Sep 1, 2023 •

edited

Loading

sbueringer commented Sep 1, 2023 •

edited

Loading