Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1.14 Testgrid test jobs should match master, not 1.13 #11555

Closed
jberkus opened this issue Feb 27, 2019 · 27 comments
Closed

1.14 Testgrid test jobs should match master, not 1.13 #11555

jberkus opened this issue Feb 27, 2019 · 27 comments
Labels
area/testgrid kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/release Categorizes an issue or PR as relevant to SIG Release.
Milestone

Comments

@jberkus
Copy link
Contributor

jberkus commented Feb 27, 2019

@mariantalla @spiffxp @amwat etc.

The new 1.14 Testgrid Boards were created with a set of test jobs matching the 1.13 jobs. This is wrong; the list of jobs for 1.14 should match master, as discussed at prior sig-release meetings and one Release Team meeting. This means, among other things, Blocking and Informing, not Blocking and All.

I'd just swap out the test list, but since I don't know why the boards got created with the wrong list of jobs to begin with, I wanted to make sure first that it was an accident.

/area testgrid
/milestone v1.14
/kind cleanup

@jberkus jberkus added the kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. label Feb 27, 2019
@k8s-ci-robot
Copy link
Contributor

@jberkus: You must be a member of the kubernetes/kubernetes-milestone-maintainers GitHub team to set the milestone. If you believe you should be able to issue the /milestone command, please contact your and have them propose you as an additional delegate for this responsibility.

In response to this:

@mariantalla @spiffxp @amwat etc.

The new 1.14 Testgrid Boards were created with a set of test jobs matching the 1.13 jobs. This is wrong; the list of jobs for 1.14 should match master, as discussed at prior sig-release meetings and one Release Team meeting. This means, among other things, Blocking and Informing, not Blocking and All.

I'd just swap out the test list, but since I don't know why the boards got created with the wrong list of jobs to begin with, I wanted to make sure first that it was an accident.

/area testgrid
/milestone v1.14
/kind cleanup

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@spiffxp
Copy link
Member

spiffxp commented Feb 27, 2019

/milestone v1.14
/area release-team
/sig release

@k8s-ci-robot k8s-ci-robot added the sig/release Categorizes an issue or PR as relevant to SIG Release. label Feb 27, 2019
@k8s-ci-robot k8s-ci-robot added this to the v1.14 milestone Feb 27, 2019
@krzyzacy
Copy link
Member

oh dohhhh that's still not unified...

@jberkus the current workflow is we have a script to promote the version for all jobs, like:
1.10 -> 1.11
1.11 -> 1.12
1.12 -> 1.13
1.13 -> 1.14

Looks like that's not valid anymore and we should update the workflow in test-infra doc

@krzyzacy
Copy link
Member

well - @jberkus I think the issue is still we don't have all equivalent jobs for master and release branches, aka packages-pushed-master, and we are still pending to remove kubeadm jobs from release-blocking dashboards.

@spiffxp
Copy link
Member

spiffxp commented Feb 27, 2019

I honestly forget if my blocking/informing proposal was meant to propogate to all boards or just master. I will look. I feel like linting this could help us catch it, if/when we are at a point where the job list is consistent.

@jberkus
Copy link
Contributor Author

jberkus commented Feb 27, 2019

@krzyzacy so sounds like I need to do specific surgery on the 1.14 boards then?

@krzyzacy
Copy link
Member

@jberkus there's already kubernetes/sig-release#518 - and afaik the packages-pushed-master doesn't have a branch version for a reason? cc @neolit123

@jberkus
Copy link
Contributor Author

jberkus commented Feb 27, 2019

@krzyzacy Take a look at the master boards, there's a LOT more changed than that.

@krzyzacy
Copy link
Member

hummm sorry I was staring at the blocking dashboard - feel free to shuffle all the -all dashboard to become -informing I guess? (all - blocking = informing I assume)

@neolit123
Copy link
Member

neolit123 commented Feb 27, 2019

@jberkus

The new 1.14 Testgrid Boards were created with a set of test jobs matching the 1.13 jobs. This is wrong; the list of jobs for 1.14 should match master

given we already have a 1.14 k/k branch shouldn't we just match all 1.14 jobs to 1.14? here is a 1.14 job that runs against release-1.14 fine, or rather was resurrected yesterday.

@jberkus there's already kubernetes/sig-release#518 - and afaik the packages-pushed-master doesn't have a branch version for a reason? cc @neolit123

the packages-* jobs from sig-cluster lifecycle are jobs that run over the whole support skew...
these tests are so small that running them in separate jobs/pods is a complete waste.

just have them in a generic sig-release dashboard somewhere.
also the way they work is they check already existing artifacts, so it's not like they can "block" a release that is already out (packages were pushed).

@neolit123
Copy link
Member

neolit123 commented Feb 27, 2019

actually, packages-install-deb can catch a problem from CI packages too. 🤔

ok, i'm thinking about moving the packages* jobs:
https://k8s-testgrid.appspot.com/sig-cluster-lifecycle-all#periodic-packages-pushed
https://k8s-testgrid.appspot.com/sig-cluster-lifecycle-all#periodic-packages-install-deb

to here:
https://k8s-testgrid.appspot.com/sig-release-misc

i also explained this here:
kubernetes/sig-release#518 (comment)

is misc a better home? any objections?

@neolit123
Copy link
Member

#11560

@jberkus
Copy link
Contributor Author

jberkus commented Mar 1, 2019

@spiffxp we agreed in sig-release meeting that it would apply to the 1.14 boards, but that we'd decide later whether to push it down to the older boards.

This was also a follow-up to the 1.13 cycle, to clean up the boards, not just master but versioned.

@krzyzacy
Copy link
Member

krzyzacy commented Mar 1, 2019

@jberkus so currently all branch dashboards are sorta unified - if we want to do sweeping changes I'd vote we still keep them consistent across old supported releases or else it will be painful when we bump for next release.

@jberkus
Copy link
Contributor Author

jberkus commented Mar 1, 2019

Sen, I'm up for that, but I can't decide it unilaterally.

@jberkus
Copy link
Contributor Author

jberkus commented Mar 6, 2019

@mariantalla @spiffxp @tpepper how can we move ahead on this? I really don't like going into code freeze with the same mess of noncompliant tests we had in 1.13.

@mariantalla
Copy link
Contributor

Hey, picking this up again (with the intention to make it for 1.14, but failing that let's try to have a way forward for 1.15)

I wonder if we can decouple the two; first, bring 1.14 to match master and then create a workflow for future releases.

I'm a bit worried about:

currently all branch dashboards are sorta unified

@krzyzacy , if I got that right, does that mean that changing the 1.14 dashboards might affect 1.13 dashboards? If yes, in what way?

#11370 seems to have created the 1.14 jobs - is the script that made the testgrid config changes somewhere in test-infra? 🤔bump_e2e_image.sh seems to create the prow jobs but (as far as I can see) not the testgrid ones 👀

@krzyzacy
Copy link
Member

changing the 1.14 dashboards might affect 1.13 dashboards? If yes, in what way?

Not right now. But when we create 1.15 branch jobs, we are not doing that by deleting 1.11 and creating 1.15, we remap the dashboard from 1.11 -> 1.12, 1.12 -> 1.13, 1.13 -> 1.14, 1.14 -> 1.15

Feel free to do that for 1.14 for now, we can bikeshed older dashboards before 1.15 happens.

bump_e2e_image.sh seems to create the prow jobs but (as far as I can see) not the testgrid ones eyes

that script bumps the image used in the job, which has nothing to do with testgrid :-)

@mariantalla
Copy link
Contributor

mariantalla commented Mar 20, 2019

I've just started (started being the keyword) work in #11849 for 1.14 in specific. I think it will be a good exercise to do once manually and then consider automating for v1.15 onwards.

Other things on my list:

@spiffxp
Copy link
Member

spiffxp commented Mar 20, 2019

Not just testgrid configs, also job configs, we just bumped into an error with 1.13->1.14 missing a change in master (ref: #11850)

@imkin
Copy link

imkin commented Mar 20, 2019

Audited the PR #10795 and found that the other job configs were fine except for the one reported in #11850

Verfied the same in the master. Automation needs to change based on this but will do that async to this issue.

@spiffxp
Copy link
Member

spiffxp commented May 10, 2019

/milestone v1.15
ref: #11977

@k8s-ci-robot k8s-ci-robot removed this from the v1.14 milestone May 10, 2019
@k8s-ci-robot k8s-ci-robot added this to the v1.15 milestone May 10, 2019
@spiffxp
Copy link
Member

spiffxp commented Jul 9, 2019

/milestone v1.16

@k8s-ci-robot k8s-ci-robot modified the milestones: v1.15, v1.16 Jul 9, 2019
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 7, 2019
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Nov 6, 2019
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/testgrid kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/release Categorizes an issue or PR as relevant to SIG Release.
Projects
None yet
Development

No branches or pull requests

8 participants