Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC -- Investigate concourse for release pipelines #848

Closed
hoegaarden opened this issue Aug 16, 2019 · 7 comments
Closed

RFC -- Investigate concourse for release pipelines #848

hoegaarden opened this issue Aug 16, 2019 · 7 comments
Labels
area/release-eng Issues or PRs related to the Release Engineering subproject kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. sig/release Categorizes an issue or PR as relevant to SIG Release.

Comments

@hoegaarden
Copy link
Contributor

hoegaarden commented Aug 16, 2019

Disclaimer: I work for Pivotal, where concourse was born. I used concourse a lot in the past, but don't work on any of the concourse teams.

Proposal

Use concourse instead of GCB to run the release tooling.

Example pipeline screenshot: kind-on-c Example resource version tracking Example job output, showing failed/passed runs, log output, taks & inputs to run

Existing PoCs/spikes/...

  • branchff
    Instead of running branchff on individual's laptops. In addition to that, this pipeline actually verifies the branchff'ed state by running some e2e tests against a kind-deployed cluster. The only thing that is missing here is pushing upstream when the tests came back green (= use one put resource).
  • send announcement mails for patch releases
    As far as a I know there is no real process for doing this, everyone on the patch release team has a different workflows (probably for c&p-ing things). Sidenote: this uses sendgrid to send the mails to the different google groups. This is currently done inline in a task, but should probably migrated to a resource, which can then be shared with other parts of the release tooling.

Why concourse?

First, because I know it quite well and like it. But secondly, because it brings a lot of features which can be valuable for things we want to do as part of releasing kubernetes. I don't want to go into too much detail here right now (but am happy to discuss and show if that's useful), but the interesting features to me are:

  • is open source
  • declarative pipeline config
  • pipeline as a first class citizen
  • inputs and outputs as versioned resources
  • tracking of which versions of things have been integrated with which versions of other things
  • extendable via custom resource types
  • everything runs in a container
  • easy to clone a pipeline (might be interesting for downstream consumers/distributors/sec-team/...)
  • can run locally on docker or onPrem on VMs or k8s
  • Credential management built in (vault and others)
  • multiple auth backends (e.g. can use github teams)
  • well tested, mature
  • (subjectively) great & deliberately limited UI

Breaking the current release tooling into small, distinct steps, is on the roadmap anyway and has already started (to a very low degree). With the great investigation @bartsmykla put in, we have an understanding of which artefacts different steps consume or produce. This could be codified into concourse resources which would

  • allow to track which versions of which things have been used / created and gives us a better way to reason about that
  • decouples the tasks (building things, baking container images, compiling things, ...) from pulling and pushing the artefacts: makes the tasks easier in general, easier to reason about and refactor in specific.

Concourse has a great UI, which allows use to understand which steps need to run to e.g. create a release from start to end and where we currently are, and which versions of which things are in play.

Disadvantage / Challenges

  • would need to be hosted and managed by the community
  • not too well-known outside of the CF / (Enterprise) PKS communities
  • limited support for non-{linux,windows} workers

Potential introduction plan

  • in k/release:

    • move the current GCB steps to a pipeline on concourse, which essentially just runs anago in one step
    • use concourse resource for all inputs and outputs for that big step
    • split into smaller steps and use resources to move artefacts
    • refactor single steps one by one
  • infrastructure:

    • work with #wg-k8s-infra to get a GKE cluster
    • install concourse
    • figure out (in collab #wg-k8s-infra and CNCF?) which cred store (vault?) we want to use and where to host it

Needed Infra

  • a k8s cluster for concourse itself, e.g. GKE
    • probably a distinct one for the release team, as we might need/want to run priv'ed containers
  • potentially a vault instance somewhere

Other Remarks, Questions, ...

/area release-eng
/kind cleanup
/cc @kubernetes/release-engineering

@k8s-ci-robot k8s-ci-robot added area/release-eng Issues or PRs related to the Release Engineering subproject kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. labels Aug 16, 2019
@javier-b-perez
Copy link
Contributor

Hi,

Thanks for idea, this is the first time I have heard of concourse.
I personally think the constrain here is anago and how it operates, not GCB. If we split the logic into small units that we can run as independent containers we can orchestrate that work anywhere (docker, GCB, concourse, ...)

I will join the meeting to hear more about this.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 14, 2019
@justaugustus justaugustus added sig/release Categorizes an issue or PR as relevant to SIG Release. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Dec 9, 2019
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 8, 2020
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Apr 7, 2020
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@justaugustus
Copy link
Member

Removing the rotten label so this doesn't pop up again in sweeps.
/remove-lifecycle rotten

Agreed w/ @xmudrii in kubernetes/sig-release#1069:

I think that we have got quite experienced GCB over the time, it works well for us, and we are sponsored by Google for infra. While I love experimenting with the new stuff, I think this would require a lot of time-wise investment, so we should eventually leave it aside for now until we don't finish the main tooling-related tasks.

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label May 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/release-eng Issues or PRs related to the Release Engineering subproject kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. sig/release Categorizes an issue or PR as relevant to SIG Release.
Projects
None yet
Development

No branches or pull requests

5 participants