Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add multi-stage dockerfile #556

Closed
wants to merge 3 commits into from
Closed

Add multi-stage dockerfile #556

wants to merge 3 commits into from

Conversation

XAMPPRocky
Copy link
Collaborator

Fixes #553 at least for me. This is much faster on both macOS and Windows (it also removes the WSL requirement on Windows as you can now build with docker build .), and thanks to cargo-chef, it has much better caching. This doesn't touch the CI infra, as I figured it would be better to build up the support needed first.

@markmandel is there anything else we need to include in the image other than the binary?

@markmandel
Copy link
Member

This is very interesting! This does remove several of the benefits of having the split container image build pipeline, but I also get the desire to have a faster iterative loop.

This also gives me an interesting idea on how to maintain the incremental cache on build-image powered Docker builds, without it running into and invalidating incremental changes on local builds and vice versa (which I've also run into a lot).

I'll give it a shot on my end, and we can compare and contrast approaches.

@XAMPPRocky
Copy link
Collaborator Author

This does remove several of the benefits of having the split container image build pipeline, but I also get the desire to have a faster iterative loop.

What benefits are you thinking of? This essentially is that, you can stop the build at any of the FROM stages with the --target command.

@markmandel
Copy link
Member

What benefits are you thinking of?

  • The binary that is generated by this file is now different from the one you ship with release (they are likely the same, but small differences can be the thing that gets you)
  • You now have two images rather than one you have to keep in line with each other in terms of dependencies and build toolchains etc.
  • I would be curious to see what the image size difference is.
  • Development toolchain now deviates, as it's stepping away from the Make system, which is a harder onboarding path than having a single toolset (although it could be pulled back in) for new developers.
  • To make an entire build you are building a binary twice - once through the build image, and then again through the multi-step image.
  • I would want to see it work with a distroless image (small vulnerability surface), like we have now but I expect that would be possible.

I'm going to also have a go at adding a BUILD_LOCAL argument that will work across targets for the Makefile as well, which in combination with the incremental changes I made in #561 should bring us to basically the same place as this, as one could choose which version of optimisation they would like, depending on their setup, without having to deviate from a single development toolchain (which also provides an incremental onboarding, as they can start with straight make + docker, and adjust if they see fit as they get more involved with the project).

@XAMPPRocky
Copy link
Collaborator Author

XAMPPRocky commented Aug 9, 2022

Okay, I should clarify few things, in general I’m largely in agreement with you, this PR is just currently divergent because I started with the simplest image that could work, but we should align this with what we currently have.

  • The binary that is generated by this file is now different from the one you ship with release (they are likely the same, but small differences can be the thing that gets you)
  • You now have two images rather than one you have to keep in line with each other in terms of dependencies and build toolchains etc.

This should be changed so it’s one and the same.

Development toolchain now deviates, as it's stepping away from the Make system, which is a harder onboarding path than having a single toolset (although it could be pulled back in) for new developers.

Well like the above, we should just have the current make commands use this dockerfile for its builds, so if you’re someone who’s using Make, there’s zero changes to you.

I will also again note that this project will (and does) have a higher number of potential contributors on Windows being a gamedev related project, and make does not natively work on Windows. While WSL is an option it’s not always the best experience. For example; this repo has terrible git performance on WSL due the presence of submodules. CMake is not really a great option either for what we’re doing which is just a task runner.

Since you can already develop cross platform without make because of cargo, I think it makes sense that Unix is also not a requirement for building a image, especially for once-off or infrequent contributors where docker build . is something they already know, where as they have to learn what make commands they can use. Having make is great for CI and repeat contributors who need to do more tasks repeatedly that are more complex than simply just “give me an image of the project”.

  • I would be curious to see what the image size difference is.
  • I would want to see it work with a distroless image (small vulnerability surface), like we have now but I expect that would be possible.

The image is currently ~38MB according to GCR, again the reason for Debian was just because I know GLIBC works with Rust there with no changes. Changing to distroless is no problem.

To make an entire build you are building a binary twice - once through the build image, and then again through the multi-step image.

This one I’m confused about, this multi-stage image removes the need for the build image, because that’s essentially what all the stages before the final one is, build images that prepare and compile the binary. We can have more stages that extra info or args for compiling different platforms.

I’ll also just note separately that we’ve run into strange issues with the the build image step that we haven’t been able report because we haven’t found anything actionable or concrete (also some of them I thought was just me, but have seen other people report them recently.) Things like running into illegal instruction kills on macOS or just stopping with no error output on WSL.

@markmandel
Copy link
Member

This should be changed so it’s one and the same.

In theory, we could pull the linux binary from the image -- but then it does diverge from how we build the Windows binary (although we build the macOS binary is a different way as well).

The build-image is handy because it encapsulates all the tooling we use - from mdbook, htmltest, gcloud, etc -- having that all in one place makes a lot of tasks very easy to implement -- so if we have a multi-stage docker image, I don't think we can ever get away from maintaining multiple images to keep them all in sync (this is why I've generally stayed away from multi-stage images where you are publishing both binaries and container images).

will also again note that this project will (and does) have a higher number of potential contributors on Windows being a gamedev related project, and make does not natively work on Windows.

That is a 100% valid point for sure. I see other infrastructure projects (K8s, etc) use Make + Docker, and require Windows users to use WSL, but I expect it's a much smaller number of Windows contributors in those parts.

Slightly tangential point on this then - I was digging through cargo extensions and came across:
https://crates.io/crates/cargo-make

If we moved from Make to this (or another cargo based task runner extension), I'm also wondering if that would potentially move Windows users off of the need for WSL, which will hopefully fix much of t he issues?

Since people are will have cargo installed anyway, to install a cargo plugin doesn't seem like an onerous ask if it helps with cross platform development. WDYT?

I’ll also just note separately that we’ve run into strange issues with the the build image step that we haven’t been able report because we haven’t found anything actionable or concrete (also some of them I thought was just me, but have seen other people report them recently.) Things like running into illegal instruction kills on macOS or just stopping with no error output on WSL.

Please do report - even if it's just a screen grab of what happened in a "Strange things that happen on macOS" and a "strange things that happen in Windows" issue, so we can collate and see if we can troubleshoot.

@XAMPPRocky
Copy link
Collaborator Author

XAMPPRocky commented Aug 9, 2022

In theory, we could pull the linux binary from the image -- but then it does diverge from how we build the Windows binary (although we build the macOS binary is a different way as well).

Why would it diverge? If we setup the build stage of the docker image for cross-compliation we can have the same image build a Windows, Linux, and macOS binary with the only thing "changing" is the target that you provide as an argument to docker build.

The build-image is handy because it encapsulates all the tooling we use - from mdbook, htmltest, gcloud, etc -- having that all in one place makes a lot of tasks very easy to implement -- so if we have a multi-stage docker image, I don't think we can ever get away from maintaining multiple images to keep them all in sync (this is why I've generally stayed away from multi-stage images where you are publishing both binaries and container images).

I'm also not quite sure I understand why we can't do the same here? There's no reason we can't make the chef stage into a generic builder stage that does contain all our tooling that is needed to build the binary and the documentation in later stages. Like the whole benefit of a multi-stage dockerfile is that you are essentially having multiple images work together but it's all in one file so its easy to manage and build off of. For example; for release if we had to include extra licencing or the cargo about page in the image, that can be added as a step after the build-binary step, and for development you can skip those extra steps by saying docker build --target=build-binary ..

Since people are will have cargo installed anyway, to install a cargo plugin doesn't seem like an onerous ask if it helps with cross platform development. WDYT?

Yeah I'm totally in favour of this, I think since you have to have cargo anyway asking someone to run cargo install is totally fine. I even thinking asking them to install another tool is fine, as long it's easy to install and work out of the box, rather than make's "setup MinGW or WSL environment" requirement.

There's a few other ones I know of you might want to have a look at. I have no real preference myself (except maybe a slight preference for earthly because I think it's a cool new technology), so whichever tool you find that works best for you is fine with me.

  • just A self styled "just" a command runner (also written in rust so can be installed with cargo)
  • cargo xtask Not a tool, but a pattern for having your project automation defined in Rust and callable with cargo directly.
  • earthly Markets itself as "what if Make and Docker had a baby".

@markmandel
Copy link
Member

Why would it diverge? If we setup the build stage of the docker image for cross-compliation we can have the same image build a Windows, Linux, and macOS binary with the only thing "changing" is the target that you provide as an argument to docker build.

That seems like a complicated multi-stage Docker image, but it does seem possible.

The macOS build image is rather complicated, hence it was easier to just use it for mac builds -- I wonder if you used it as a base image that would work, or if it's defaults would overwrite anything else you would want to do.

Maybe a silly question - but do build steps cache between runs? Every time I've tried them I found they didn't, but maybe I've done it wrong? (Docs seems to indicate that they are?)

There's no reason we can't make the chef stage into a generic builder stage that does contain all our tooling that is needed to build the binary and the documentation in later stages.

I think I'd have to see an example of more than building a linux container, to be honest. I'm not sure it would be an good iterative loop, because there is only one level of caching (docker). Also, I'd like to see how you would automate pulling the binary out of the docker image (I'm assuming docker cp ?) -- this is what has kept me from multi-stage builds in general. I've never been able to visualise how they would work in a more complicated workflow that had several requirements above and beyond just building a container as the final artifact.

I'm trying to work out how you would run the agones integration test? How would you get the kubernetes and gcloud credentials into the container? If you wanted to copy them in, you would have to run the docker build command from the root of the home directory, otherwise you couldn't get access.

If you want to do the equivalent of make docs - which runs a process that looks at the mounted documentation, and refreshes the page of the documentation on change, you can't do that either.

@XAMPPRocky
Copy link
Collaborator Author

XAMPPRocky commented Aug 10, 2022

I'm trying to work out how you would run the agones integration test? How would you get the kubernetes and gcloud credentials into the container? If you wanted to copy them in, you would have to run the docker build command from the root of the home directory, otherwise you couldn't get access.

I've never been able to visualise how they would work in a more complicated workflow that had several requirements above and beyond just building a container as the final artifact.

To be clear, I'm not proposing we use docker for anything but making a container containing the build tools, and building the final artifacts. Multi-stage Docker is not a replacement for make, it's a replacement for having two seperate docker images and trying to connect them together with make like we have now. We'd still need make or another command runner for actually running things like k8s or gcloud. This would just make it so that docker build . and make build-image produce the same artifact.

@markmandel
Copy link
Member

That makes sense - I think it's going to get messy, given the variety of artifacts we end up generating.

I think it will take building out a replacement for several targets that we currently have in Make right now to be able to see if it's really viable or not -- as each time I think through a target, I end up getting a little stuck, or run into wrinkles I'm not quite sure are going to work out.

But, working through it, here is a few ways it might work (in my head at least), and the ramifications therein:

Rust + General build Images

We have three active images - one for Rust + Containe building (Dockerfile in root of project), and a second one in ./build/build-image, which has everything else.

./Dockerfile:

  • Builder step with Rust installed, cargo, and related cargo tooling we use, i.e. cargo-chef, clippy, mdbook, cargo-about, etc.
  • g++-mingw-w64-x86-64 (for windows compilation), libssl-dev, etc
  • Cargo target for x86_64-pc-windows-gnu
  • Container step for make the actual container

Mac binary builder:

  • Same image we use now for compilation

/build/build-image/Dockerfile

  • The rest of the tools we use, htmltest, helm, terraform, gcloud, kuebctl

Running through each of the targets,

  • make test-docs - will require one execution against the rust build image to build the docs, one against the build-image to run htmltest. Not impossible, but an extra context switch, instead of one command.
  • make shell wouldn't really be possible, with all the tools in a single place, as rust and everything else are split between two
  • make test-agones - might potentially break the authentication, as if you are testing against GKE, the kubectl being invoked needs to call gcloud, but we're also trying to run Rust code, and they are in different images now
    make docs won't work, since Rust and mdbook is separate from all the python tooling. Maybe we should bring it all together at this point.
  • make build-all-binaries - we will likely have three paths now to build binaries, rather than two (which are at least closely relatied) (1) Linux - make the container and docker cp out, (2) Windows - use the rust build image and mount in the code to build it (or maybe a different container profile from the builder is another way? Seems like a waste to build a container you would never use) (3) Mac - use the Mac build image (which is what we do now, because Mac is hard).

This feels like the wrong approach, as you lose functionality that's very useful.

Build step entirely replaces build-image

TL;DR - any time there was a Make targets that used to use the build image, you need to (simplified) docker build --target=builder --tag=quilkin-build . to build the same image, as the builder step has everything in /build/build-image/Dockerfile.

  • I would have to try it, but I am wondering how this would impact image caching on CI. Right now we can use the nightly build-image image as a layer cache for the builder image. I'm assuming you could do the same here as well - but if you can't, it would expand our CI times considerably (and they aren't that fast right now). I'm not totally comfortable with how much control you have over caching of build steps, but experimentation will determine if it's an issue.
  • make build-linux-binary would copy source in, then docker cp the files out.
  • make build-windows-binary would use the --target=builder image, and then mount the local files in to it to generate it's binary. The other option would be to create a whole new target in the container image just for the windows binary, that would never produce a container image that would never run, docker cp the file out and delete it once done.
  • make build-macos-binary stays the same, mounting the local files in and generating the binary that way.

So you do end up with a bit of a mix of approaches for doing the same thing here, So we add a bunch of complexity to our build tooling -- but maybe it's worth it just for the container image build time.

Wondering if the fixes in #561 and #564 mitigate the image building speed issue, such that we could maintain (relative) consistency between each build artifact construction (particularly around building binaries) -- I'd be curious to know what your tests on Windows and/or Mac have produced.

@quilkin-bot
Copy link
Collaborator

Build Succeeded 🥳

Build Id: 4c910dc8-c358-4a25-8126-e7bad6e4a8d4

The following development images have been built, and will exist for the next 30 days:

To build this version:

git fetch [email protected]:googleforgames/quilkin.git pull/556/head:pr_556 && git checkout pr_556
cargo build

@XAMPPRocky XAMPPRocky closed this Jul 11, 2023
@markmandel markmandel deleted the docker-build branch August 3, 2023 23:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make build-image use local environment
3 participants