RFC -- Investigate concourse for release pipelines #848

hoegaarden · 2019-08-16T11:07:52Z

Disclaimer: I work for Pivotal, where concourse was born. I used concourse a lot in the past, but don't work on any of the concourse teams.

Proposal

Use concourse instead of GCB to run the release tooling.

Existing PoCs/spikes/...

branchff
Instead of running branchff on individual's laptops. In addition to that, this pipeline actually verifies the branchff'ed state by running some e2e tests against a kind-deployed cluster. The only thing that is missing here is pushing upstream when the tests came back green (= use one put resource).
send announcement mails for patch releases
As far as a I know there is no real process for doing this, everyone on the patch release team has a different workflows (probably for c&p-ing things). Sidenote: this uses sendgrid to send the mails to the different google groups. This is currently done inline in a task, but should probably migrated to a resource, which can then be shared with other parts of the release tooling.

Why concourse?

First, because I know it quite well and like it. But secondly, because it brings a lot of features which can be valuable for things we want to do as part of releasing kubernetes. I don't want to go into too much detail here right now (but am happy to discuss and show if that's useful), but the interesting features to me are:

is open source
declarative pipeline config
pipeline as a first class citizen
inputs and outputs as versioned resources
tracking of which versions of things have been integrated with which versions of other things
extendable via custom resource types
everything runs in a container
easy to clone a pipeline (might be interesting for downstream consumers/distributors/sec-team/...)
can run locally on docker or onPrem on VMs or k8s
Credential management built in (vault and others)
multiple auth backends (e.g. can use github teams)
well tested, mature
(subjectively) great & deliberately limited UI

Breaking the current release tooling into small, distinct steps, is on the roadmap anyway and has already started (to a very low degree). With the great investigation @bartsmykla put in, we have an understanding of which artefacts different steps consume or produce. This could be codified into concourse resources which would

allow to track which versions of which things have been used / created and gives us a better way to reason about that
decouples the tasks (building things, baking container images, compiling things, ...) from pulling and pushing the artefacts: makes the tasks easier in general, easier to reason about and refactor in specific.

Concourse has a great UI, which allows use to understand which steps need to run to e.g. create a release from start to end and where we currently are, and which versions of which things are in play.

Disadvantage / Challenges

would need to be hosted and managed by the community
not too well-known outside of the CF / (Enterprise) PKS communities
limited support for non-{linux,windows} workers

Potential introduction plan

in k/release:
- move the current GCB steps to a pipeline on concourse, which essentially just runs anago in one step
- use concourse resource for all inputs and outputs for that big step
- split into smaller steps and use resources to move artefacts
- refactor single steps one by one
infrastructure:
- work with #wg-k8s-infra to get a GKE cluster
- install concourse
- figure out (in collab #wg-k8s-infra and CNCF?) which cred store (vault?) we want to use and where to host it

Needed Infra

a k8s cluster for concourse itself, e.g. GKE
- probably a distinct one for the release team, as we might need/want to run priv'ed containers
potentially a vault instance somewhere

Other Remarks, Questions, ...

Is concourse the only system that has all those features? Probably not, but it is the best thing I know of / have used in prod. Alternatives with similar features might be tekton and tekton based things? Not sure about that though, never used any of those?
other examples of potentially interesting / k8s-related concourse pipelines:
- kind on concourse pipeline
- CFCR (kubernetes BOSH release, base for Enterproise PKS)
I added this to the release engineering meeting agenda for 2019-08-19
other Slack threads related to this:
- introducing branchff PoC
- request to run branchff pipeline

/area release-eng
/kind cleanup
/cc @kubernetes/release-engineering

javier-b-perez · 2019-08-16T15:57:03Z

Hi,

Thanks for idea, this is the first time I have heard of concourse.
I personally think the constrain here is anago and how it operates, not GCB. If we split the logic into small units that we can run as independent containers we can orchestrate that work anywhere (docker, GCB, concourse, ...)

I will join the meeting to hear more about this.

fejta-bot · 2019-11-14T16:45:42Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2020-03-08T04:28:28Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2020-04-07T05:11:30Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot · 2020-05-07T05:55:27Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

k8s-ci-robot · 2020-05-07T05:55:41Z

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

justaugustus · 2020-05-21T17:26:02Z

Removing the rotten label so this doesn't pop up again in sweeps.
/remove-lifecycle rotten

Agreed w/ @xmudrii in kubernetes/sig-release#1069:

I think that we have got quite experienced GCB over the time, it works well for us, and we are sponsored by Google for infra. While I love experimenting with the new stuff, I think this would require a lot of time-wise investment, so we should eventually leave it aside for now until we don't finish the main tooling-related tasks.

k8s-ci-robot added area/release-eng Issues or PRs related to the Release Engineering subproject kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. labels Aug 16, 2019

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 14, 2019

justaugustus added sig/release Categorizes an issue or PR as relevant to SIG Release. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Dec 9, 2019

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 8, 2020

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Apr 7, 2020

xmudrii mentioned this issue Apr 25, 2020

Review of the Release Engineering backlog kubernetes/sig-release#1069

Closed

7 tasks

k8s-ci-robot closed this as completed May 7, 2020

k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label May 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC -- Investigate concourse for release pipelines #848

RFC -- Investigate concourse for release pipelines #848

hoegaarden commented Aug 16, 2019 •

edited

Loading

javier-b-perez commented Aug 16, 2019

fejta-bot commented Nov 14, 2019

fejta-bot commented Mar 8, 2020

fejta-bot commented Apr 7, 2020

fejta-bot commented May 7, 2020

k8s-ci-robot commented May 7, 2020

justaugustus commented May 21, 2020

RFC -- Investigate concourse for release pipelines #848

RFC -- Investigate concourse for release pipelines #848

Comments

hoegaarden commented Aug 16, 2019 • edited Loading

Proposal

Existing PoCs/spikes/...

Why concourse?

Disadvantage / Challenges

Potential introduction plan

Needed Infra

Other Remarks, Questions, ...

javier-b-perez commented Aug 16, 2019

fejta-bot commented Nov 14, 2019

fejta-bot commented Mar 8, 2020

fejta-bot commented Apr 7, 2020

fejta-bot commented May 7, 2020

k8s-ci-robot commented May 7, 2020

justaugustus commented May 21, 2020

hoegaarden commented Aug 16, 2019 •

edited

Loading