Create process for triaging/fixing kubeadm test failures #251

pipejakob · 2017-04-20T18:13:29Z

Now that we have continuously running end-to-end tests, and the desire to add more, we should create a shared process that sets expectations for community members around who should triage and fix kubeadm, or the tests themselves, when they start failing.

Here are some of the scenarios that have already come up and could have benefited from a clear policy:

A contributor creates a legitimate kubeadm bug and e2es immediately start failing.
- Seems like they should fix their bug after tests start failing (time/availability permitting).
A contributor changes kubeadm behavior in an expected way that works, but breaks our current testing.
- Should they be held responsible for always updating our e2e tests for their changes? We do expect people to update unit tests.
An upstream change to the test-infra code (e.g. bootstrap.py, kubernetes_e2e.py, etc.) breaks the kubeadm end-to-end tests.
- Hopefully they would provide a fix if they were notified about the regression, but we don't have a clear contract between SIGs around this, and will need buy-in from the EngProd team for any expectations we have for their support.

This is one of the action items from the 1.6.0 postmortem.

The text was updated successfully, but these errors were encountered:

jamiehannaford · 2017-05-03T13:42:01Z

I think 1 will tie into the usual feedback cycle for PRs: if a PR build fails, the contributor can incorporate changes to get the e2e tests to pass again. That's their prerogative.

For 2 I agree that it's the contributor who should take primary responsibility for calibrating their code with the existing suite of tests, regardless of whether they're unit or e2e, and making sure they verify the (new) state of the codebase. Ultimately it's the task of project maintainers to keep this in check, so they can be contacted for help whenever required.

I'm also interested in formalising the contract between test-infra and SIGs. How is EngProd contacted, do they have a SIG? One idea maybe would be to set up some informal SLAs which document who the best contributor to contact is if regressions occur.

Also, where would this process doc live? In this repo's README or CONTRIBUTING.md?

luxas · 2017-05-29T13:48:53Z

@pipejakob @jamiehannaford Sounds like we should create a document about this process...
Also very tightly related to #252

Are one of you up to doing that?

jamiehannaford · 2017-06-01T13:02:29Z

I can have a go at writing this

luxas · 2017-06-01T16:25:59Z

Awesome @jamiehannaford!
Seems like I can't technically assign this to you, but feel free to have a go at it

pipejakob added the kind/postmortem label Apr 25, 2017

jamiehannaford mentioned this issue Jun 2, 2017

Add triage doc #286

Merged

luxas closed this as completed in #286 Jun 15, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create process for triaging/fixing kubeadm test failures #251

Create process for triaging/fixing kubeadm test failures #251

pipejakob commented Apr 20, 2017

jamiehannaford commented May 3, 2017

luxas commented May 29, 2017

jamiehannaford commented Jun 1, 2017

luxas commented Jun 1, 2017

Create process for triaging/fixing kubeadm test failures #251

Create process for triaging/fixing kubeadm test failures #251

Comments

pipejakob commented Apr 20, 2017

jamiehannaford commented May 3, 2017

luxas commented May 29, 2017

jamiehannaford commented Jun 1, 2017

luxas commented Jun 1, 2017