Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dl.k8s.io: Redirect CI URIs to Kubernetes Community infra #1857

Merged
merged 2 commits into from
Jul 2, 2021

Conversation

justaugustus
Copy link
Member

@justaugustus justaugustus commented Mar 31, 2021

Signed-off-by: Stephen Augustus [email protected]

Part of kubernetes/release#1711 and #1569

krel ci-build has been successfully building and pushing CI Kubernetes
builds for a time now.

Updating this redirect (dl.k8s.io/ci) to Kubernetes Community infra
moves us closer to no longer relying on the kubernetes-release-dev
(Google-owned) bucket.

Signed-off-by: Stephen Augustus <[email protected]>
@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. sig/contributor-experience Categorizes an issue or PR as relevant to SIG Contributor Experience. wg/k8s-infra labels Mar 31, 2021
@k8s-ci-robot k8s-ci-robot requested review from aojea and nikhita March 31, 2021 18:42
@k8s-ci-robot k8s-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Mar 31, 2021
All Kubernetes builds that land in the 'ci' directory of k8s-release-dev
are cross builds, so this reference is no longer required.

Signed-off-by: Stephen Augustus <[email protected]>
@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Mar 31, 2021
@justaugustus justaugustus changed the title [WIP] dl.k8s.io: Redirect CI URIs to Kubernetes Community infra dl.k8s.io: Redirect CI URIs to Kubernetes Community infra Mar 31, 2021
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 31, 2021
@justaugustus
Copy link
Member Author

/hold for Aaron's feedback

/assign @spiffxp @BenTheElder
cc: @kubernetes/release-engineering

@LappleApple @puerco @sladyn98 -- This is the PR I was referring to in today's RelEng triage session.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 31, 2021
@BenTheElder
Copy link
Member

What's the retention on these buckets currently and are the contents equivalent now?
(less concerning for this bucket I think, but in the past when migrating buckets we've tried to ensure the contents will be 1:1)

@BenTheElder
Copy link
Member

also should this wait for 1.21 wrap up ..?

@ameukam
Copy link
Member

ameukam commented Mar 31, 2021

Ref: #1569

@justaugustus
Copy link
Member Author

What's the retention on these buckets currently and are the contents equivalent now?

$ gsutil retention get gs://k8s-release-dev
gs://k8s-release-dev/ has no Retention Policy.
$ gsutil retention get gs://kubernetes-release-dev
gs://kubernetes-release-dev/ has no Retention Policy.

The contents are roughly equivalent for active release branches as the new krel-based and deprecated bootstrap build jobs have been running in tandem for a few months now.

also should this wait for 1.21 wrap up ..?

Absolutely fine for this to wait until 1.21 is wrapped.
It was one of the things on my mind during triage and I figured it'd be good to toss the PR up to discuss. :)

@spiffxp
Copy link
Member

spiffxp commented Apr 1, 2021

#525 (comment) re: retention

@justaugustus
Copy link
Member Author

justaugustus commented Apr 1, 2021

#525 (comment) re: retention

Just so we have it in both places:

From https://cloud.google.com/container-registry/docs/managing#deleting_images:

* "Do not apply Cloud Storage retention policies to storage buckets used by Container Registry. These policies to not work for managing images in Container Registry storage buckets." - so we're following the recommended guidance by not setting these

* https://github.com/sethvargo/gcr-cleaner is a not-official-Google-product that could accomplish this

@spiffxp -- This is awesome! Just curious, is this a general FYI or something you'd consider using here?

Mentioning because I think we're only concerned w/ non-container build artifacts for gs://kubernetes-release-dev, right?

@justaugustus
Copy link
Member Author

@spiffxp @BenTheElder -- How are we feeling about this now that v1.21.0 has shipped?

@spiffxp
Copy link
Member

spiffxp commented Apr 22, 2021

Ah right, brain misfired re: retention. I somehow lumped k8s-staging-ci-images into this.

So my concern with switching right now is possible increased support burden, aka handling "why is my CI job failing?" emails/issues/slack-messages. We still have (flake-induced) diffs between gs://kubernetes-release-dev and gs://k8s-release-dev (ref: #846 (comment)). Ideally, we would either:

  • address this by syncing from gs://kubernetes-release-dev to gs://k8s-release-dev (periodically or automatically)
  • automatically measure the diff and make it brightly visible

If I knew someone had bandwidth to actively babysit this or handle an increased support burden, I would feel more comfortable flipping the switch to find out.

FWIW there are also some community jobs that still need to switch over

test-infra$ go test -v -count=1 ./config/tests/jobs/jobs_test.go -run TestKubernetesE2eJobsMustExtractFromK8sInfraBuckets
# ... actual job names ...
    jobs_test.go:1206:  165/2176 jobs should be updated to pull from community-owned gcs buckets

@BenTheElder
Copy link
Member

I think we should do this in 1.21 👍

@neolit123 can you or one of the other kubadm/kinder contributors / users comment on if this looks ready for your usage?

Otherwise we primarily have kube-up and kops to consider at the moment, I think. In terms of in-tree projects. They largely look the same and I think this should "just work" ™️

@BenTheElder
Copy link
Member

BenTheElder commented Apr 22, 2021

#846 (comment)

We should either sync (one-way), move dl.k8s.io and other things aggressively (post-v1.21), or make it really really clear when and why this happens

Yes, this should be a prerequisite for flipping the switch. I don't think it should be overly challenging to setup a periodic one-way sync?

@neolit123
Copy link
Member

i think this should be fine.

both kinder and kubeadm use:

  • https://storage.googleapis.com/k8s-release-dev for CI labels and artifacts
  • https://dl.k8s.io for release labels and artifacts.

maybe we can switch from https://dl.k8s.io to the direct prod bucket eventually (to be more consistent about URLs in these tools).

@spiffxp
Copy link
Member

spiffxp commented Apr 23, 2021

Using a regex in these comparisons to exclude some paths:

  • -bazel: bazel builds in kubernetes-release-dev (there appear to be none in k8s-release-dev, unsure if we should expect some for older branches)
  • ^ci-kubernetes: some coverage jobs dumping data solely in kubernetes-release-dev, these should migrate to k8s-release-dev, no need to sync in two places
  • fast/: ci/fast is only in k8s-release-dev, no need to sync back to kubernetes-release-dev

If I list one level deep, things look flaky but not too bad:

# date: 2021-04-22T23:20:16
# old: gs://kubernetes-release-dev/ci/
# new: gs://k8s-release-dev/ci/
# exclude_regex: ^(ci-kubernetes|fast)|-bazel
# summary:
#   total  (in either)                       :    901 (100.0%)
#   common (in both)                         :    827 ( 91.8%)
#   only in gs://kubernetes-release-dev/ci/  :     50 (  5.5%)
#   only in gs://k8s-release-dev/ci/         :     24 (  2.7%)

If I list recursively though things look off:

# date: 2021-04-22T23:20:53
# old: gs://kubernetes-release-dev/ci/**
# new: gs://k8s-release-dev/ci/**
# exclude_regex: ^(ci-kubernetes|fast)|-bazel
# summary:
#   total  (in either)                         : 384958 (100.0%)
#   common (in both)                           : 334896 ( 87.0%)
#   only in gs://kubernetes-release-dev/ci/**  :  40443 ( 10.5%)
#   only in gs://k8s-release-dev/ci/**         :   9619 (  2.5%)

Stuff only in k8s-release-dev is 24 flaky builds that line up with above report

$ cat only.new.txt | cut -d/ -f1 | sort | uniq -c | wc -l
      24

Stuff only in kubernetes-release-dev is:

  • some set of 27 extra files missing for most builds (not all.. so did this start or end within our retention window?)
  • 50 flaky builds that line up with above report
$ cat only.old.txt | cut -d/ -f1 | sort | uniq -c | wc -l
     765
$ cat only.old.txt | cut -d/ -f1 | sort | uniq -c | awk '{print $1}' | sort | uniq -c
 715 27 
  42 417 
   2 444 
   6 456

The 27 files in question:

$ cat only.old.txt | grep v1.22.0-alpha.0.83+669016067d4911
v1.22.0-alpha.0.83+669016067d4911/extra/gce/configure.sh
v1.22.0-alpha.0.83+669016067d4911/extra/gce/configure.sh.sha256
v1.22.0-alpha.0.83+669016067d4911/extra/gce/configure.sh.sha512
v1.22.0-alpha.0.83+669016067d4911/extra/gce/master.yaml
v1.22.0-alpha.0.83+669016067d4911/extra/gce/master.yaml.sha256
v1.22.0-alpha.0.83+669016067d4911/extra/gce/master.yaml.sha512
v1.22.0-alpha.0.83+669016067d4911/extra/gce/node.yaml
v1.22.0-alpha.0.83+669016067d4911/extra/gce/node.yaml.sha256
v1.22.0-alpha.0.83+669016067d4911/extra/gce/node.yaml.sha512
v1.22.0-alpha.0.83+669016067d4911/extra/gce/shutdown.sh
v1.22.0-alpha.0.83+669016067d4911/extra/gce/shutdown.sh.sha256
v1.22.0-alpha.0.83+669016067d4911/extra/gce/shutdown.sh.sha512
v1.22.0-alpha.0.83+669016067d4911/extra/gce/windows/common.psm1
v1.22.0-alpha.0.83+669016067d4911/extra/gce/windows/common.psm1.sha256
v1.22.0-alpha.0.83+669016067d4911/extra/gce/windows/common.psm1.sha512
v1.22.0-alpha.0.83+669016067d4911/extra/gce/windows/configure.ps1
v1.22.0-alpha.0.83+669016067d4911/extra/gce/windows/configure.ps1.sha256
v1.22.0-alpha.0.83+669016067d4911/extra/gce/windows/configure.ps1.sha512
v1.22.0-alpha.0.83+669016067d4911/extra/gce/windows/install-ssh.psm1
v1.22.0-alpha.0.83+669016067d4911/extra/gce/windows/install-ssh.psm1.sha256
v1.22.0-alpha.0.83+669016067d4911/extra/gce/windows/install-ssh.psm1.sha512
v1.22.0-alpha.0.83+669016067d4911/extra/gce/windows/k8s-node-setup.psm1
v1.22.0-alpha.0.83+669016067d4911/extra/gce/windows/k8s-node-setup.psm1.sha256
v1.22.0-alpha.0.83+669016067d4911/extra/gce/windows/k8s-node-setup.psm1.sha512
v1.22.0-alpha.0.83+669016067d4911/extra/gce/windows/user-profile.psm1
v1.22.0-alpha.0.83+669016067d4911/extra/gce/windows/user-profile.psm1.sha256
v1.22.0-alpha.0.83+669016067d4911/extra/gce/windows/user-profile.psm1.sha512

These files were in both buckets as far back as the retention window goes:

$ for v in $(grep -v '^[^/]*.txt$' either.txt | cut -d. -f2 | uniq); do v="v1.${v}"; grep ${v}.*extra/gce/configure.sh.sha512 common.txt | head -n1 | cut -d/ -f1; done;
v1.18.16-rc.0.11+9f76669271b74b
v1.19.8-rc.0.15+b054c92f501386
v1.20.3-rc.0.17+21ad05d5d14004
v1.21.0-alpha.1.268+194e5193c45c8c

Then, they stopped appearing in k8s-release-dev (only appearing in kubernetes-release-dev) as of these builds:

$ for v in $(grep -v '^[^/]*.txt$' either.txt | cut -d. -f2 | uniq); do v="v1.${v}"; grep ${v}.*extra/gce/configure.sh.sha512 only.old.txt | head -n1 | cut -d/ -f1; done;
v1.18.16-rc.0.30+3260181ab19a81
v1.19.10-rc.0.1+2c03527428d49a
v1.20.3-rc.0.15+18194169ac684f
v1.21.0-alpha.1.260+64baf0f73f8231
v1.22.0-alpha.0.1+7146eb593126d7

Is this a change that didn't get ported back to the deprecated builds?

@justaugustus
Copy link
Member Author

@kubernetes/release-managers -- Can I ask one of you to help provide feedback on Aaron's investigation?

@spiffxp
Copy link
Member

spiffxp commented Jun 9, 2021

Getting current: slightly better (common went from 87.0% to 90.1%), but still not where we need to be

# date: 2021-06-09T15:28:16
# old: gs://kubernetes-release-dev/ci/**
# new: gs://k8s-release-dev/ci/**
# exclude_regex: ^(ci-kubernetes|fast)|-bazel
# summary:
#   total  (in either)                         : 366058 (100.0%)
#   common (in both)                           : 329984 ( 90.1%)
#   only in gs://kubernetes-release-dev/ci/**  :  27151 (  7.4%)
#   only in gs://k8s-release-dev/ci/**         :   8923 (  2.4%)

The questions in #1857 (comment) remain unanswered.

@spiffxp
Copy link
Member

spiffxp commented Jun 9, 2021

/sig testing
/sig release
/area artifacts
/milestone v1.22

@k8s-ci-robot k8s-ci-robot added sig/testing Categorizes an issue or PR as relevant to SIG Testing. sig/release Categorizes an issue or PR as relevant to SIG Release. labels Jun 9, 2021
@k8s-ci-robot k8s-ci-robot added this to the v1.22 milestone Jun 9, 2021
@k8s-ci-robot k8s-ci-robot added the area/artifacts Issues or PRs related to the hosting of release artifacts for subprojects label Jun 9, 2021
@spiffxp
Copy link
Member

spiffxp commented Jul 2, 2021

A fun wrinkle I realized while asking myself "wait, why don't I just use gustil rsync for this" when trying to clean up my weird diff report script for PR.

The files in each bucket that have matching filenames are not bit-for-bit identical files, nor do they have matching timestamps, since they're built in two different jobs at two slightly different times. So rsync will by default try to overwrite basically everything.

@spiffxp
Copy link
Member

spiffxp commented Jul 2, 2021

I am FWIW willing to flip the switch and find out what happens over what will be for some a long holiday weekend. I realize code freeze is coming up, but the next few days should be quiet, and I can be available to revert if need be.

@justaugustus
Copy link
Member Author

I am FWIW willing to flip the switch and find out what happens over what will be for some a long holiday weekend. I realize code freeze is coming up, but the next few days should be quiet, and I can be available to revert if need be.

In favor of this.
I can keep an eye out over the weekend as well.

Copy link
Member

@spiffxp spiffxp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve
/lgtm
Going to flip the switch and see what happens. I will be available to revert if things blow up. Mindful of impacting the run up to code freeze.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 2, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: justaugustus, spiffxp

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 2, 2021
@justaugustus
Copy link
Member Author

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 2, 2021
@spiffxp
Copy link
Member

spiffxp commented Jul 2, 2021

Opened #2291 with a script I've been using to keep an eye on the delta between gs://k8s-release-dev and gs://kubernetes-release-dev

@k8s-ci-robot k8s-ci-robot merged commit f262df3 into kubernetes:main Jul 2, 2021
@@ -158,7 +158,7 @@ data:
# Don't require /release/ if you want to get at the Kubernetes release artifacts, the common case.
rewrite ^/(v[0-9]+\.[0-9]+\.[0-9]+(-(alpha|beta|rc)\.[0-9]+)?/.*)$ https://storage.googleapis.com/kubernetes-release/release/$1 redirect;
# Provide a convenient redirect for CI (continuous integration) artifacts as well, which live in a different bucket.
rewrite ^/ci(-cross)?/?(.*)$ https://storage.googleapis.com/kubernetes-release-dev/ci$1/$2 redirect;
rewrite ^/ci/?(.*)$ https://storage.googleapis.com/k8s-release-dev/ci$1/$2 redirect;
Copy link
Member

@spiffxp spiffxp Jul 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two things I noticed when trying to deploy:

Fixup PR inbound

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahhh, regex, my old enemy.
Thanks for the cleanup, Aaron.

I'll fixup the k/release references.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened #2292

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

* `ci-cross` is still referenced here, and cutting .debs may be broken until this code is fixed: https://github.com/kubernetes/release/blob/6e3b30b6dd78b705474a1086f2ed7f7b8364029b/packages/deb/build.go#L291-L302

Fixed in kubernetes/release#2153.

@spiffxp
Copy link
Member

spiffxp commented Jul 3, 2021

I deployed using the contents of #2292

cd k8s.io
./deploy.sh canary
# tests passed

Deployed to prod

./deploy.sh prod
# ...
======================================================================
FAIL: test_dl (__main__.RedirTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./test.py", line 294, in test_dl
    ver=rand_num(), path=rand_num())
  File "./test.py", line 100, in assert_temp_redirect
    self.assert_multischeme_redirect(partial_url, expected_loc, 302, **kwargs)
  File "./test.py", line 97, in assert_multischeme_redirect
    scheme + '://' + partial_url, expected_loc, expected_code, **kwargs)
  File "./test.py", line 92, in assert_scheme_redirect
    % (url, expected_loc, resp.getheader('location')))
AssertionError: 'https://storage.googleapis.com/kubernetes-release-dev/ci/v3487/3552' != 'https://storage.googleapis.com/k8s-release-dev/ci/v3487/3552'
- https://storage.googleapis.com/kubernetes-release-dev/ci/v3487/3552
?                                 ^^^^^^^^
+ https://storage.googleapis.com/k8s-release-dev/ci/v3487/3552
?                                 ^
 :
GET "http://dl.k8s.io/ci/v3487/3552" got an unexpected redirect location:
 want: https://storage.googleapis.com/k8s-release-dev/ci/v3487/3552
 got:  https://storage.googleapis.com/kubernetes-release-dev/ci/v3487/3552

I'm going to assume this is eventually consistent, waiting a bit

@spiffxp
Copy link
Member

spiffxp commented Jul 3, 2021

Yup, good now

----------------------------------------------------------------------
Ran 25 tests in 13.379s

OK

@spiffxp
Copy link
Member

spiffxp commented Jul 3, 2021

OK so as of ~2020-07-03 00:15:00, https://dl.k8s.io/ci redirects to gs://k8s-release-dev instead of gs://kubernetes-release-dev

Any breakages, report them here. Broader notification later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/artifacts Issues or PRs related to the hosting of release artifacts for subprojects cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/contributor-experience Categorizes an issue or PR as relevant to SIG Contributor Experience. sig/release Categorizes an issue or PR as relevant to SIG Release. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants