Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automate Katib Releases #2049

Open
andreyvelich opened this issue Dec 2, 2022 · 24 comments
Open

Automate Katib Releases #2049

andreyvelich opened this issue Dec 2, 2022 · 24 comments

Comments

@andreyvelich
Copy link
Member

Currently, to make Katib releases we have to follow this manual process: https://github.com/kubeflow/katib/tree/master/docs/release

We run make release command, build and publish the release Docker images locally, and publish Katib SDK version.
Since we build docker images locally, our release images don't support multi OS arch: https://hub.docker.com/layers/kubeflowkatib/katib-controller/v0.14.0/images/sha256-51ca80d6005010ff08853a5f7231158cb695ea899b623200076cbc01509fc0b5?context=repo.

The release process should be automated. For example, we can utilise GitHub Actions to make Katib releases.

cc @tenzen-y @johnugeorge


Love this feature? Give it a 👍 We prioritize the features with the most 👍

@johnugeorge
Copy link
Member

We can use a workflow_dispatch or release trigger in GHA

@tenzen-y
Copy link
Member

tenzen-y commented Dec 2, 2022

@andreyvelich Thanks for proposing this.

Since we build docker images locally, our release images don't support multi OS arch

That's right. For now, we can not release multi-platform images by that documentation's steps.

The release process should be automated. For example, we can utilise GitHub Actions to make Katib releases.

I agree with you.

We can use a workflow_dispatch or release trigger in GHA

I prefer to use the release trigger.

@anencore94
Copy link
Member

anencore94 commented Dec 17, 2022

That's right. For now, we can not release multi-platform images by that documentation's steps.
The release process should be automated. For example, we can utilise GitHub Actions to make Katib releases.

If we could prepare arm-machine self-hosted runner(or use github action arm runner with extra charge), we could make the automate release. How could we prepare the arm machine ?

@tenzen-y
Copy link
Member

That's right. For now, we can not release multi-platform images by that documentation's steps.
The release process should be automated. For example, we can utilise GitHub Actions to make Katib releases.

If we could prepare arm-machine self-hosted runner(or use github action arm runner with extra charge), we could make the automate release. How could we prepare the arm machine ?

@anencore94
I mean we need to modify the make release command since we can not build multiplatform images using that command.
Or Does that mean we should prepare arm-machine runners to run tests for arm env?

@anencore94
Copy link
Member

anencore94 commented Dec 19, 2022

@tenzen-y I mean if we prepare arm-machine runners, we could build arm-platform images at github-action workflows much easier and then publish them by manifests including both amd and arm image. WDYT ?

I'm not sure we need to enable make release to build multiplatform images at local. But I think it would be better to publish multiplatform image at release

@tenzen-y
Copy link
Member

tenzen-y commented Dec 19, 2022

@tenzen-y I mean if we prepare arm-machine runners, we could build arm-platform images at github-action workflows much easier and then publish them by manifests including both amd and arm image. WDYT ?

I'm not sure we need to enable make release to build multiplatform images at local. But I think it would be better to publish multiplatform image at release

@anencore94 I see. We can build multiplatform images using the default amd64 runner. Actually, we publish multi-platform images for every commit like this.

Probably, we don't need arm64 runners for the multi-platform build.

Does that sound good to you?

@anencore94
Copy link
Member

anencore94 commented Dec 19, 2022

We can build multiplatform images using the default amd64 runner. Actually, we publish multi-platform images for every commit like this.
Probably, we don't need arm64 runners for the multi-platform build.

Sure, But building an arm-image in amd64-runner would be much slower since it uses some kind of virtualizer like QEMU to build arm-image.
So if we could prepare arm64 runner, then it would be better. However, if it is not affordable, then yes I agree with to build it with amd64-runner. @tenzen-y

@tenzen-y
Copy link
Member

tenzen-y commented Dec 19, 2022

We can build multiplatform images using the default amd64 runner. Actually, we publish multi-platform images for every commit like this.
Probably, we don't need arm64 runners for the multi-platform build.

Sure, But building an arm-image in amd64-runner would be much slower since it uses some kind of virtualizer like QEMU to build arm-image. So if we could prepare arm64 runner, then it would be better. However, if it is not affordable, then yes I agree with to build it with amd64-runner. @tenzen-y

@anencore94 I see. That's a great idea, I agree with your idea. It makes speed up building time if we could prepare arm64 runners.
Maybe, docker build create --append command and remote build instance help us.

Using multiple native nodes provide better support for more complicated cases that are not handled by QEMU and generally have better performance. You can add additional nodes to the builder instance using the --append flag.

Assuming contexts node-amd64 and node-arm64 exist in docker context ls;

 docker buildx create --use --name mybuild node-amd64
 docker buildx create --append --name mybuild node-arm64
 docker buildx build --platform linux/amd64,linux/arm64 .

https://docs.docker.com/build/building/multi-platform/#building-multi-platform-images

The Buildx remote driver allows for more complex custom build workloads, allowing you to connect to externally managed BuildKit instances. This is useful for scenarios that require manual management of the BuildKit daemon, or where a BuildKit daemon is exposed from another source.

docker buildx create \
  --name remote-unix \
  --driver remote \
  unix://$HOME/buildkitd.sock

https://docs.docker.com/build/drivers/remote/

@johnugeorge
Copy link
Member

@anencore94 @tenzen-y Currently, we are not using self hosted runners. We need to review this sometime if we can use self hosted runners in AWS

@midhun1998
Copy link
Member

midhun1998 commented Feb 26, 2023

I'm willing to help create the flows for the release. Do let me know if you guys need any help once we have some agreement on runners.

@andreyvelich
Copy link
Member Author

Hi @midhun1998, that would be great!
Currently, we follow this manual process for our releases: https://github.com/kubeflow/katib/tree/master/docs/release.
We can discuss how to automate it (e.g. using GitHub Actions) on the upcoming AutoML + Training WG Meeting.

@anencore94
Copy link
Member

I'd like to contribute on this automation too. see you on the next meeting :)

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@tenzen-y
Copy link
Member

/remove-lifecycle stale

Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@andreyvelich
Copy link
Member Author

/lifecycle frozen
/help

Copy link

@andreyvelich:
This request has been marked as needing help from a contributor.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.

In response to this:

/lifecycle frozen
/help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@andreyvelich
Copy link
Member Author

/good-first-issue

This is good issue to work on if you are familiar with GitHub actions and can help us to automate releases for Katib/Training Operator.
Feel free to propose your ideas/suggestions.

Copy link

@andreyvelich:
This request has been marked as suitable for new contributors.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-good-first-issue command.

In response to this:

/good-first-issue

This is good issue to work on if you are familiar with GitHub actions and can help us to automate releases for Katib/Training Operator.
Feel free to propose your ideas/suggestions.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@google-oss-prow google-oss-prow bot added the good first issue Good for newcomers label Mar 15, 2024
@xr-dev-saurabh
Copy link

/assign

@djmcgreal-cc
Copy link

For those tracking this issue for Arm64 support, it looks like it was actually implemented already.

@andreyvelich
Copy link
Member Author

For those tracking this issue for Arm64 support, it looks like it was actually implemented already.

Yes, we start publishing ARM images when we run the release script, but the whole release process is not yet automated.

@mahdikhashan
Copy link
Contributor

@andreyvelich hey, can i take over this issue? there are/is no attached pr.

@andreyvelich
Copy link
Member Author

@mahdikhashan thank you for your interest, we already have 2 pending PRs to automate release process for Training Operator:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants