Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add github action workflow to build pipeline base images #603

Closed

Conversation

barthy1
Copy link
Member

@barthy1 barthy1 commented Oct 5, 2020

Changes

Currently pipeline base image is built using gcr.io/kaniko-project/executor:v0.17.1 which doesn't support multi-arch builds, so it is a stopper to get multi-arch pipeline releases.

This Github actions workflow can be used to build pipeline base image in multi-arch form with docker buildx.
Architectures for manifest are amd64, arm64, s390x, ppc64le.
The build runs on amd64 architecture, executing builds for other architectures using emulation.

The setup requires to have 2 secret parameters("GKE_SA_KEY" and "GKE_PROJECT")
to push images to Google Cloud registry.
There are also 3 input parameters to construct image name. The default values will create the gcr.io/tekton-nightly/github.com/tektoncd/pipeline/build-base:latest image.
Workflow can be start manually at github.com or via Github API.

Submitter Checklist

These are the criteria that every PR should meet, please check them off as you
review them:

See the contribution guide
for more details.

This github actions workflow is used to build pipeline base image in
multi-arch form with docker buildx.
Architectures for manifest are amd64, arm64, s390x, ppc64le.
The build runs on amd64 architecture, running builds for other
architectures with emulation.

The setup requires to have 2 secret parameters("GKE_SA_KEY" and "GKE_PROJECT")
to push images to Google Cloud registry.
There are also 3 input parameters to construct image name. The default
values will create the `gcr.io/tekton-nightly/github.com/tektoncd/pipeline/build-base:latest`
image.
Workflow can be start manually at github.com or via Github API.

Signed-off-by: Yulia Gaponenko <[email protected]>
@tekton-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign dibyom
You can assign the PR to them by writing /assign @dibyom in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tekton-robot tekton-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Oct 5, 2020
@tekton-robot
Copy link
Contributor

Hi @barthy1. Thanks for your PR.

I'm waiting for a tektoncd member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@tekton-robot tekton-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Oct 5, 2020
@bobcatfish
Copy link
Contributor

I think @vdemeester or @afrittoli were telling me about this recently but I'm wondering if using GitHub actions is what is solving our problem here or if it is the fact that the GitHub action(s) are using something other than kaniko to build with? There is no reason that we have to keep using kaniko - if using a different image / build method solves this problem we could use it.

(Basically I'm trying to make sure that there isn't some underlying issue with Tekton itself that makes Tekton not work for this scenario, because if that is the case I think it's something we'd want to fix)

@vdemeester
Copy link
Member

I think @vdemeester or @afrittoli were telling me about this recently but I'm wondering if using GitHub actions is what is solving our problem here or if it is the fact that the GitHub action(s) are using something other than kaniko to build with? There is no reason that we have to keep using kaniko - if using a different image / build method solves this problem we could use it.

(Basically I'm trying to make sure that there isn't some underlying issue with Tekton itself that makes Tekton not work for this scenario, because if that is the case I think it's something we'd want to fix)

This comes from #592. We discussed this in the productivity call and we kinda decided to go the quick way to unlock this while we work on making progress (:crossed_fingers:) on having all this in Tekton. The current "problem" with tekton is not from tekton itself but more on "building multi-arch images in kubernetes", it doesn't have anything to do with the architecture of tekton. As far as I know, there is not way to build multi-arch images using kaniko in a generic way. The current "way" to do it with docker (and buildkit) is to use qemu and some qemu-user-static hack that comes with a bunch of problems to run it in a kubernetes cluster (requires privileged, some assumption on the kernel, …). GitHub Actions do not have this problem because they are VMs and basically, you can do whatever in (because you can be root in there).
We could run VMs in kubernetes and do something similar to GitHub actions, but it might take some time — Can this be done in GKE, ….

We can wait to run all this in tekton, or we can do the same as we do with prow, use what works while we wait and work on a solution completely tekton based 👼

@vdemeester
Copy link
Member

/ok-to-test
/kind misc

@tekton-robot tekton-robot added kind/misc Categorizes issue or PR as a miscellaneuous one. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Oct 8, 2020
--platform linux/amd64,linux/arm64,linux/ppc64le,linux/s390x \
--tag ${{ github.event.inputs.imageRegistry }}/${{ github.event.inputs.pathToProject }}/${{ github.event.inputs.url }} \
--push \
./pipeline/images
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be different ? 🤔

Copy link
Member Author

@barthy1 barthy1 Oct 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean platforms? which one would you you suggest?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nah I meant the ./pipeline/image path. where is it in the tektoncd/plumbing repository ? shouldn't this be pointing at each images with a Dockerfile ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so far I've created this PR to build 1 image -gcr.io/tekton-nightly/github.com/tektoncd/pipeline/build-base:latest - https://github.com/tektoncd/pipeline/blob/master/.ko.yaml#L5
And I thought its Dockerfile is located at https://github.com/tektoncd/pipeline/tree/master/images, so pipeline/images for pipeline repo

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it works smoothly we can build other images from plumbing repo itself

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ohhhh It's targeting tektoncd/pipeline, I see 😅

@bobcatfish
Copy link
Contributor

Thanks for the explanation @vdemeester , I agree with moving ahead to unblock this (and think it's important we document why we are making these kinds of decisions).

@imjasonh
Copy link
Member

imjasonh commented Oct 8, 2020

I'd like to avoid taking a dependency on GitHub Actions if we can avoid it, since our CI/CD setup is already sort of complicated, involving both Prow and dogfooded Tekton workflows.

If the issue at hand is that kaniko is incapable of building multi-arch images, I think there are two ways we could get around it while staying within Tekton:

  1. Make kaniko capable of building multi-arch images. If not in the general case, at least in the specific case we'd need to unblock ourselves -- it looks like we just need to install some packages, we don't even need to RUN anything besides that. @mattmoor has done some exploration here, and I think it might be closer than we thinkn
  2. Use docker build with a dind sidecar instead of kaniko, at least for now. I suspect this is along the lines of what GitHub Actions is doing under the hood (using docker build, unclear about whether dind is involved).

If those options have already been discussed and rejected it would be good to understand why.

@mattmoor
Copy link
Member

mattmoor commented Oct 8, 2020

@imjasonh I posted the changes I used to build a multi-arch kaniko image here: GoogleContainerTools/kaniko#1452

The 0.18 release of mink I cut last night has builds of everything (incl. tekton and kaniko) with arm64: https://github.com/mattmoor/mink/releases/tag/v0.18.0

These aren't producing multi-arch images themselves, but since kaniko can now run on arm64 it can produce arm64 images, and then these can be stitched together with docker manifest create. Perhaps you could run a multi-arch cluster with amd64 and arm64 nodepools and use a Pipeline to orchestrate Tasks for each architecture w/ nodeSelectors, and then a final task that stitches them all together.

It'd be a really powerful pattern to demonstrate with Tekton (w/ sample, blog), but I have not tried it.

@barthy1
Copy link
Member Author

barthy1 commented Oct 8, 2020

@mattmoor agree that having just heterogeneous cluster with workers on different architectures is very interesting use case. And I am really looking forward to see that (or even participate with workers on ppc64le or s390x). However GKE where the Tekton dogfooding cluster is running has only amd64 clusters (please, correct me if I am wrong).

At this moment the question is how to build build-base image (with amd64 Tekton) for many architectures to be able to release Tekton for these architectures :) Without Tekton build for specific arch you cannot run it on worker with specific arch.

I'd like also to point out that bazel which was used to build kaniko here is released at this moment only for arm64 and amd64. So s390x, ppc64le and other possible architectures will be excluded in this picture, because of building tool dependency.

@mattmoor
Copy link
Member

mattmoor commented Oct 8, 2020

I'd like also to point out that bazel which was used to build kaniko here is released at this moment only for arm64 and amd64. So s390x, ppc64le and other possible architectures will be excluded in this picture, because of building tool dependency.

Test dependency yes, but since Bazel can cross-compile images, getting the builds effectively devolves to adding those architectures here: https://github.com/GoogleContainerTools/kaniko/pull/1452/files#diff-653e8b99fa9eba9096b9386de586f579R19-R22

@mattmoor
Copy link
Member

mattmoor commented Oct 8, 2020

However GKE where the Tekton dogfooding cluster is running has only amd64 clusters (please, correct me if I am wrong).

Not wrong, I just wasn't limiting the infra to GKE. I have no clue how Tekton operates (or funds) its infra, so 🤷

@barthy1
Copy link
Member Author

barthy1 commented Oct 8, 2020

Test dependency yes, but since Bazel can cross-compile images, getting the builds effectively devolves to adding those
architectures here: https://github.com/GoogleContainerTools/kaniko/pull/1452/files#diff-653e8b99fa9eba9096b9386de586f579R19-R22

Thanks for pointing me to that, as kaniko/executor is in our plans for s390x, because it is used for many Tekton tests anyway.
So if we can just cross compile it, it sounds really promising 👍

@mattmoor
Copy link
Member

mattmoor commented Oct 8, 2020

@barthy1 Yeah, no problem. I think the infra question remains because we need a place to run kaniko natively for that to work, but I'd like to push kaniko upstream to publish a wider variety of architectures once we have some miles through the amd64/arm64 version. After all, there's effectively nothing architecture specific in that image other than the binary, and all we're doing is cross-compiling which is a pretty well exercised path, I'd think.

@barthy1
Copy link
Member Author

barthy1 commented Oct 8, 2020

I think the infra question remains because we need a place to run kaniko natively for that to work

Well :) we can use Travis with its native hardware, but it's still extension and using another CI tool :(

Github Actions was suggested just as faster and working way to speed up multi-arch builds at first step with plan in the future to switch to Tekton Tasks anyway. However I think this discussion is really great to collect some other ideas and have opportunity to verify them..

@mattmoor
Copy link
Member

mattmoor commented Oct 8, 2020

As a fairly radical alternative, you could try to eliminate the Dockerfile dependency entirely by replacing the tooling dependencies with libraries.

e.g. where git is used today start to adopt https://godoc.org/gopkg.in/src-d/go-git.v4 etc

@vdemeester
Copy link
Member

I'd like to avoid taking a dependency on GitHub Actions if we can avoid it, since our CI/CD setup is already sort of complicated, involving both Prow and dogfooded Tekton workflows.

[…]

If those options have already been discussed and rejected it would be good to understand why.

In general, we all agree it would be better to not depend on any other CI and we should aim to build anything in tekton. That said, building multiarch images can be tricky to achieve, in kubernetes at least (aka in containers).

If the issue at hand is that kaniko is incapable of building multi-arch images, I think there are two ways we could get around it while staying within Tekton:
1. Make kaniko capable of building multi-arch images. If not in the general case, at least in the specific case we'd need to unblock ourselves -- it looks like we just need to install some packages, we don't even need to RUN anything besides that. @mattmoor has done some exploration here, and I think it might be closer than we thinkn
2. Use docker build with a dind sidecar instead of kaniko, at least for now. I suspect this is along the lines of what GitHub Actions is doing under the hood (using docker build, unclear about whether dind is involved).

This is, indeed, one issue, but let's give a bit more context. This is mainly about #592, and thus about building Dockerfile for multiarch. We can't make any assumption on the content of the Dockerfile (test-runner does quite some RUN and things for example). Indeed, (1) is possible, but it needs time and this would make kaniko run on multiple architecture, which means we need hardware (or a VM) to run it for the given architecure, which gets us to (2). To build multiarch images using docker (2), it is more involved than a normal docker build (there is no magic). We basically need to do what this github action does, aka, run something that will setup binfmt, … to execute the correct qemu static binary to run with the given architecture. In a cluster, this is very involved, it needs privileged access and may setup the node kernel in some weird way. I did not try to do this with DinD but still, if it works, it will still need privileged.

As a fairly radical alternative, you could try to eliminate the Dockerfile dependency entirely by replacing the tooling dependencies with libraries.

This works for tektoncd/pipeline and go projects, it definitely doesn't for ./tekton/images in here.

In the light of the comments, I feel @chmouel's comment is a good middle ground. We can create tasks to create a gcloud compute node, setup docker on it, build the images using docker and get the result.

@barthy1
Copy link
Member Author

barthy1 commented Oct 23, 2020

Close this PR in favour of tektoncd/pipeline#3402

@barthy1 barthy1 closed this Oct 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/misc Categorizes issue or PR as a miscellaneuous one. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants