Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate test-infra metrics to wg-k8s-infra #1306

Closed
5 of 6 tasks
spiffxp opened this issue Oct 6, 2020 · 24 comments
Closed
5 of 6 tasks

Migrate test-infra metrics to wg-k8s-infra #1306

spiffxp opened this issue Oct 6, 2020 · 24 comments
Assignees
Labels
priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. sig/testing Categorizes an issue or PR as relevant to SIG Testing.
Milestone

Comments

@spiffxp
Copy link
Member

spiffxp commented Oct 6, 2020

Part of migrating away from gcp-project k8s-gubernator: #1308

These queries: https://github.com/kubernetes/test-infra/tree/master/metrics

  • prow jobs that run via prow.k8s.io - these should be migrated to run on k8s-infra-prow-build (-trusted?)
  • results are stored in gs://k8s-metrics - we should use a k8s-infra owned bucket instead

Proposal for new projects/names

  • gcp project: k8s-gubernator -> kubernetes-public
  • gcs bucket: gs://k8s-metrics -> k8s-project-metrics

I know of at least some people who link directly to the in-bucket json, eg: http://storage.googleapis.com/k8s-metrics/flakes-latest.json. It would be helpful if we could instead serve 301 redirects to the new location of these files

EDIT(spiffxp):

The plan

@spiffxp spiffxp changed the title Migrate gs://k8s-metrics to wg-k8s-infra Migrate test-infra metrics to wg-k8s-infra Oct 6, 2020
@spiffxp spiffxp added wg/k8s-infra sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels Oct 6, 2020
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 4, 2021
@ameukam
Copy link
Member

ameukam commented Jan 8, 2021

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 8, 2021
@spiffxp
Copy link
Member Author

spiffxp commented Jan 21, 2021

/assign @MushuEE
for help fleshing out steps, I think this could qualify for /help

@spiffxp
Copy link
Member Author

spiffxp commented Jan 21, 2021

/milestone v1.21

@k8s-ci-robot k8s-ci-robot added this to the v1.21 milestone Jan 21, 2021
@spiffxp spiffxp self-assigned this Jan 21, 2021
@spiffxp spiffxp added the priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. label Jan 22, 2021
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 22, 2021
@ameukam
Copy link
Member

ameukam commented Apr 22, 2021

/remove-lifecycle stale
/milestone clear

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 22, 2021
@k8s-ci-robot k8s-ci-robot removed this from the v1.21 milestone Apr 22, 2021
@spiffxp
Copy link
Member Author

spiffxp commented Apr 23, 2021

/milestone v1.22

@k8s-ci-robot k8s-ci-robot added this to the v1.22 milestone Apr 23, 2021
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 23, 2021
@spiffxp
Copy link
Member Author

spiffxp commented Jul 23, 2021

/remove-lifecycle stale
/milestone v1.23

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 23, 2021
@spiffxp
Copy link
Member Author

spiffxp commented Aug 2, 2021

@spiffxp
Copy link
Member Author

spiffxp commented Aug 2, 2021

$ gsutil -m rsync gs://k8s-metrics gs://k8s-project-metrics
# ...
| [7.5k/7.5k files][  5.0 GiB/  5.0 GiB] 100% Done  35.2 MiB/s ETA 00:00:00
Operation completed over 7.5k objects/5.0 GiB.

@spiffxp
Copy link
Member Author

spiffxp commented Aug 2, 2021

Working on using the new infra via a canary job:

  • config/jobs: add metrics-bigquery-canary test-infra#23097
  • Would like to try billing bq usage against kubernetes-public after I've confirmed it works with k8s-infra-prow-build
  • No strong opinions on where we bill, just want to prototype what permissions are needed so that we can bill against a project of our choice

@ameukam
Copy link
Member

ameukam commented Aug 2, 2021

$ gsutil -m rsync gs://k8s-metrics gs://k8s-project-metrics
# ...
| [7.5k/7.5k files][  5.0 GiB/  5.0 GiB] 100% Done  35.2 MiB/s ETA 00:00:00
Operation completed over 7.5k objects/5.0 GiB.

No auto sync planned regularly?

@spiffxp
Copy link
Member Author

spiffxp commented Aug 2, 2021

If I can get away with moving the bucket vs. having to migrate all links to the new bucket, no, no periodic sync

@spiffxp
Copy link
Member Author

spiffxp commented Aug 2, 2021

https://testgrid.k8s.io/wg-k8s-infra-canaries#metrics-bigquery-canary - currently failing

e.g. https://storage.googleapis.com/kubernetes-jenkins/logs/metrics-bigquery-canary/1422317015489581056/build-log.txt

BigQuery error in show operation: Unknown project 'k8s-infra-prow-build-trusted'

Hm, ok, but when I run locally with my google.com account or my personal account

$ bq query --format=prettyjson --project_id=k8s-infra-prow-build-trusted 'select count(*) from [k8s-gubernator:build.day]'
Waiting on bqjob_r5b61d60c431f26e5_0000017b08eceeab_1 ... (0s) Current status: DONE
[
  {
    "f0_": "6257"
  }
]

I ended up changing this job from a bootstrap job to decorated, I wonder if it needs to be updated to work with workload identity

@spiffxp
Copy link
Member Author

spiffxp commented Aug 2, 2021

kubernetes/test-infra#23102 - workload identity is my best guess for now

@spiffxp
Copy link
Member Author

spiffxp commented Aug 3, 2021

Turns out it was adding roles/bigquery.user to the service account for the project against which the query was being billed. A thing I did manually, so will PR a terraform change for that in a bit.

'm sure there's a much more locked down set of permissions we could try, such that the service account doesn't accidentally create datasets in k8s-infra-prow-build-trusted, but I think this is good enough for now.

@spiffxp
Copy link
Member Author

spiffxp commented Aug 3, 2021

Did the dance to empty out the old bucket, remove it, create a new bucket with the same name in the new GCP org/project, and sync contents back

$ gsutil -m rsync gs://k8s-metrics gs://k8s-project-metrics
$ gsutil -m rm -r gs://k8s-metrics
$ ./infra/gcp/ensure-main-projects.sh # edited to use k8s-metrics instead of k8s-project-metrics
$ gsutil -m rsync -r gs://k8s-project-metrics gs://k8s-metrics

@spiffxp
Copy link
Member Author

spiffxp commented Aug 3, 2021

Alright! The bucket lives in kubernetes.io, the latest job ran in kubernetes.io, and everything works:

Cleaning up:

@spiffxp
Copy link
Member Author

spiffxp commented Aug 3, 2021

$ gsutil -m rm -r gs://k8s-project-metrics
# ...
/ [7.5k/7.5k objects] 100% Done 194.46 objects/s ETA 00:00:00
Operation completed over 7.5k objects.
Removing gs://k8s-project-metrics/...

@spiffxp
Copy link
Member Author

spiffxp commented Aug 3, 2021

/miletone v1.22
Looks like we are about to wrap this up before v1.22 gets released

@spiffxp
Copy link
Member Author

spiffxp commented Aug 3, 2021

#2454 has merged

Calling this done!

/close

@k8s-ci-robot
Copy link
Contributor

@spiffxp: Closing this issue.

In response to this:

#2454 has merged

Calling this done!

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@spiffxp
Copy link
Member Author

spiffxp commented Aug 6, 2021

/milestone v1.22

@k8s-ci-robot k8s-ci-robot modified the milestones: v1.23, v1.22 Aug 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. sig/testing Categorizes an issue or PR as relevant to SIG Testing.
Projects
None yet
Development

No branches or pull requests

5 participants