Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REQUEST: New Shared Read Only GitHub Token For Jobs #4433

Closed
MadhavJivrajani opened this issue Sep 1, 2023 · 12 comments
Closed

REQUEST: New Shared Read Only GitHub Token For Jobs #4433

MadhavJivrajani opened this issue Sep 1, 2023 · 12 comments
Labels
area/github-management Issues or PRs related to GitHub Management subproject lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/contributor-experience Categorizes an issue or PR as relevant to SIG Contributor Experience. sig/k8s-infra Categorizes an issue or PR as relevant to SIG K8s Infra. sig/testing Categorizes an issue or PR as relevant to SIG Testing.

Comments

@MadhavJivrajani
Copy link
Contributor

MadhavJivrajani commented Sep 1, 2023

Context

There has been an uptick in projects seeing flakiness in their tests running in either the GKE or EKS clusters due to the tests hitting GitHub rate limits.

Issues and discussions around this:

It seems like these jobs pull artifacts/files needed using the GitHub APIs. Its worth noting that cloning repositories itself does not count against the rate limit: https://github.com/orgs/community/discussions/44515.

It has also been reported that this issue of rate limiting has been exacerbated by moving jobs to the EKS cluster: #4165 (comment). This isn't surprising since nodes on EKS clusters have private IPs and the traffic is egress-ing through NAT and Internet gateways because of which the number of IPs hitting GitHub is very low. There is discussion in SIG K8s Infra around assigning public IPs to nodes, similar to how GKE does it: kubernetes/k8s.io#5759, this would not only help GitHub rate limits, but also rate limits observed while pulling from Docker registry.

However, moving to public IPs also might not prove sufficient for the GitHub rate limit issue for pull heavy jobs since some of them have been experiencing this issue even on the GKE cluster (#4165).

It has also been suggested to use ghproxy to get around this GitHub rate limit issue: #4165 (comment). However, the issue with this is implementations of non-prow clients might have to be significantly changed to adapt to ghproxy (#4165 (comment)).

Proposal

This issue hopes to track the discussion and decision around creating a new read only GitHub token that can be shared (similar to how some jobs re-use bot tokens) by projects with read heavy jobs against the GitHub API.

Authenticated requests have a rate limit of 5000 requests/hour/account, which should be a sufficient aggregate limit.

Prior art: kubernetes/k8s.io#4259

/sig k8s-infra testing contributor-experience
/area github-management
/cc @ameukam @xmudrii @kubernetes/owners

@k8s-ci-robot k8s-ci-robot added sig/k8s-infra Categorizes an issue or PR as relevant to SIG K8s Infra. sig/testing Categorizes an issue or PR as relevant to SIG Testing. sig/contributor-experience Categorizes an issue or PR as relevant to SIG Contributor Experience. area/github-management Issues or PRs related to GitHub Management subproject labels Sep 1, 2023
@sbueringer
Copy link
Member

sbueringer commented Sep 1, 2023

Thank you very much for opening this issue.

Just slightly more context. In Cluster API we had flaky jobs for the last 1-2 years, it was just at a rate of ~ <5% so we didn't push the GitHub token issue with the highest priority (so I assume more IPs alone wouldn't help, as you wrote).

@sbueringer
Copy link
Member

sbueringer commented Sep 1, 2023

If I'm connecting the dots correctly we would provide the token to the ProwJobs via ExternalSecrets. As far as I'm aware ExternalSecrets are not yet available on the EKS clusters (but there is or will be a discussion about that, cc @ameukam)

@xmudrii
Copy link
Member

xmudrii commented Sep 1, 2023

If I'm connecting the dots correctly we would provide the token to the ProwJobs via ExternalSecrets. As far as I'm aware ExternalSecrets are not yet available on the EKS clusters (but there is or will be a discussion about that, cc @ameukam)

ExternalSecrets are available in the EKS Prow build cluster. We can source secrets from the AWS Secrets Manager in the Prow AWS account and from the GCP Secrets Manager in the k8s-infra-prow-build account.

@xmudrii
Copy link
Member

xmudrii commented Sep 1, 2023

Update: we migrated all nodes in the EKS Prow build cluster to a public subnet, so all nodes have public IP addresses instead of routing all traffic via a NAT Gateway. That should significantly improve the situation, but if you still see increased failure rate due to rate limits, please let us know.

@Priyankasaggu11929
Copy link
Member

Just adding here for record -- discussion from k8s slack channel #sig-k8s-infra around Nodes are randomly freezing and failing - https://kubernetes.slack.com/archives/CCK68P2Q2/p1693476605123389

@ameukam
Copy link
Member

ameukam commented Sep 14, 2023

We should probably create a new bot. (k8s-contribex-ci-robot? ) operated by the github admin team to provide tokens requested by the community.

@mrbobbytables
Copy link
Member

I don't think we need a separate account for this. With the changes made to the eks cluster and switching to authenticated requests we realistically won't have other requests. We'll just have this one option for people to use for authenticated read-only requests.

@MadhavJivrajani
Copy link
Contributor Author

I think we can create a new account if more requests like this come up. @ameukam can we generate a token from the k8s-infra-ci-robot for this? The governance of this token can be under the purview of github-admins if needed. Thoughts?

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 28, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 27, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

@k8s-ci-robot k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale Mar 28, 2024
@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/github-management Issues or PRs related to GitHub Management subproject lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/contributor-experience Categorizes an issue or PR as relevant to SIG Contributor Experience. sig/k8s-infra Categorizes an issue or PR as relevant to SIG K8s Infra. sig/testing Categorizes an issue or PR as relevant to SIG Testing.
Projects
None yet
Development

No branches or pull requests

8 participants