Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run e2e tests on Kind #2148

Merged
merged 1 commit into from
Sep 3, 2024
Merged

Conversation

jacobsalway
Copy link
Member

@jacobsalway jacobsalway commented Aug 29, 2024

Purpose

The existing end-to-end tests haven't been run for what seems like quite a long time. It's important that new or existing end-to-end tests are run in CI as a required check for PRs to be merged to ensure that bugs aren't introduced.

Changes:

  • Run all jobs under integration.yaml on ubuntu-latest
  • Create a new e2e-test job to replace the existing integration-test job that spins up a Kubernetes cluster using Kind, builds and loads the image to the cluster, then runs the existing four e2e tests

Closes #1416

Prerequisite work before adding an end to end test for the Yunikorn batch scheduler integration for #2098

Change Category

Indicate the type of change by marking the applicable boxes:

  • Bugfix (non-breaking change which fixes an issue)
  • Feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that could affect existing functionality)
  • Documentation update

Checklist

Before submitting your PR, please review the following:

  • I have conducted a self-review of my own code.
  • I have updated documentation accordingly.
  • I have added tests that prove my changes are effective or that my feature works.
  • Existing unit tests pass locally with my changes.

@@ -91,7 +91,7 @@ jobs:
done

build-helm-chart:
runs-on: ubuntu-20.04
runs-on: ubuntu-latest
Copy link
Member Author

@jacobsalway jacobsalway Aug 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any specific dependency on Ubuntu version in these commands and across the projects I've seen it seems common to run actions on ubuntu-latest.

@jacobsalway jacobsalway marked this pull request as ready for review August 30, 2024 05:48
@jacobsalway
Copy link
Member Author

/assign @ChenYi015 @vara-bonthu

Copy link
Contributor

@ChenYi015 ChenYi015 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing the e2e test CI. I have left some comments.

Comment on lines 131 to 134

values := make(map[string]interface{})
if imageTag := os.Getenv("IMAGE_TAG"); imageTag != "" {
values["image.tag"] = imageTag
}
release, err := installAction.Run(chart, values)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use values defined in charts/spark-operator-chart/ci/ci-values.yaml as it is aimed to be used in CI workflow.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the chartutils package to load these values now

Makefile Outdated
Comment on lines 242 to 244
ifndef ignore-not-found
ignore-not-found = false
endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was it deleted as it is used in some targets e.g. uninstall-crd, deploy.

Copy link
Member Author

@jacobsalway jacobsalway Aug 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, good catch. Have kept as part of keeping the kind targets

Makefile Outdated
Comment on lines 245 to 260

.PHONY: kind-create-cluster
kind-create-cluster: kind ## Create a kind cluster for integration tests.
if ! $(KIND) get clusters 2>/dev/null | grep -q "^$(KIND_CLUSTER_NAME)$$"; then \
kind create cluster --name $(KIND_CLUSTER_NAME) --config $(KIND_CONFIG_FILE) --kubeconfig $(KIND_KUBE_CONFIG); \
fi

.PHONY: kind-load-image
kind-load-image: kind-create-cluster docker-build ## Load the image into the kind cluster.
$(KIND) load docker-image --name $(KIND_CLUSTER_NAME) $(IMAGE)

.PHONY: kind-delete-custer
kind-delete-custer: kind ## Delete the created kind cluster.
$(KIND) delete cluster --name $(KIND_CLUSTER_NAME) && \
rm -f $(KIND_KUBE_CONFIG)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The kind related targets are used in local tests if the developer does not have a k8s cluster already, perhaps we can keep these targets.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point, I've personally used k3s/k3d and minikube before as well but it looks like kind is fairly common among open source projects


- name: Build local spark-operator docker image for minikube testing
- name: Build and load image to Kind cluster
run: |
docker build -t docker.io/kubeflow/spark-operator:local .
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can utilize the docker-build target in the Makefile:

make docker-build IMAGE_TAG=local

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed this step to run make kind-load-image which also runs the docker-build target

@ChenYi015
Copy link
Contributor

Remove local Kind commands from the Makefile. The existing kind-create-cluster target was broken anyway and developers may wish to use other distributions e.g. Minikube or k3d

@jacobsalway I forgot to push the charts/spark-operator-chart/kind-config.yaml file, which breaks the kind-create-cluster target in makefile. You can make a kind config file as follows and it can be used in e2e CI and local tests:

kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
  image: kindest/node:v1.29.2
- role: worker
  image: kindest/node:v1.29.2

Signed-off-by: Jacob Salway <[email protected]>
@@ -246,7 +246,7 @@ endif
.PHONY: kind-create-cluster
kind-create-cluster: kind ## Create a kind cluster for integration tests.
if ! $(KIND) get clusters 2>/dev/null | grep -q "^$(KIND_CLUSTER_NAME)$$"; then \
kind create cluster --name $(KIND_CLUSTER_NAME) --config $(KIND_CONFIG_FILE) --kubeconfig $(KIND_KUBE_CONFIG); \
kind create cluster --name $(KIND_CLUSTER_NAME) --config $(KIND_CONFIG_FILE) --kubeconfig $(KIND_KUBE_CONFIG) --wait=1m; \
Copy link
Member Author

@jacobsalway jacobsalway Aug 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Waiting for the control plane to become ready before returning success from this target seems sensible to me. I've always used this flag when running kind create cluster myself.

Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ChenYi015

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ChenYi015
Copy link
Contributor

/lgtm

@google-oss-prow google-oss-prow bot added the lgtm label Sep 3, 2024
@google-oss-prow google-oss-prow bot merged commit c810ece into kubeflow:master Sep 3, 2024
7 checks passed
@jacobsalway jacobsalway deleted the fix/e2e-test-ci branch September 3, 2024 07:32
ChenYi015 pushed a commit to ChenYi015/spark-operator that referenced this pull request Sep 9, 2024
Signed-off-by: Jacob Salway <[email protected]>
(cherry picked from commit c810ece)
ChenYi015 pushed a commit to ChenYi015/spark-operator that referenced this pull request Sep 9, 2024
Signed-off-by: Jacob Salway <[email protected]>
(cherry picked from commit c810ece)
Signed-off-by: Yi Chen <[email protected]>
google-oss-prow bot pushed a commit that referenced this pull request Sep 23, 2024
* Support gang scheduling with Yunikorn (#2107)

* Add Yunikorn scheduler and example

Signed-off-by: Jacob Salway <[email protected]>

* Add test cases

Signed-off-by: Jacob Salway <[email protected]>

* Add code comments

Signed-off-by: Jacob Salway <[email protected]>

* Add license comment

Signed-off-by: Jacob Salway <[email protected]>

* Inline mergeNodeSelector

Signed-off-by: Jacob Salway <[email protected]>

* Fix initial number implementation

Signed-off-by: Jacob Salway <[email protected]>

---------

Signed-off-by: Jacob Salway <[email protected]>
(cherry picked from commit 8fcda12)
Signed-off-by: Yi Chen <[email protected]>

* Update Makefile for building sparkctl (#2119)

Signed-off-by: Yi Chen <[email protected]>
(cherry picked from commit 4bc6e89)
Signed-off-by: Yi Chen <[email protected]>

* fix: Add default values for namespaces to match usage descriptions  (#2128)

* fix: Add default values for namespaces to match usage descriptions

Signed-off-by: pengfei4.li <[email protected]>

* fix: remove incorrect cache settings

Signed-off-by: pengfei4.li <[email protected]>

---------

Signed-off-by: pengfei4.li <[email protected]>
Co-authored-by: pengfei4.li <[email protected]>
(cherry picked from commit 52f818d)
Signed-off-by: Yi Chen <[email protected]>

* Fix: Spark role binding did not render properly when setting spark service account name (#2135)

Signed-off-by: Yi Chen <[email protected]>
(cherry picked from commit a1a38ea)
Signed-off-by: Yi Chen <[email protected]>

* Reintroduce option webhook.enable (#2142)

Signed-off-by: Yi Chen <[email protected]>
(cherry picked from commit 9e88049)
Signed-off-by: Yi Chen <[email protected]>

* Add default batch scheduler argument (#2143)

* Add default batch scheduler argument

Signed-off-by: Jacob Salway <[email protected]>

* Add helm unit test

Signed-off-by: Jacob Salway <[email protected]>

---------

Signed-off-by: Jacob Salway <[email protected]>
(cherry picked from commit 9cc1c02)
Signed-off-by: Yi Chen <[email protected]>

* fix: unable to set controller/webhook replicas to zero (#2147)

Signed-off-by: Yi Chen <[email protected]>
(cherry picked from commit 1afa72e)
Signed-off-by: Yi Chen <[email protected]>

* Adding support for setting spark job namespaces to all namespaces (#2123)

Signed-off-by: Yi Chen <[email protected]>
(cherry picked from commit c93b0ec)
Signed-off-by: Yi Chen <[email protected]>

* Support extended kube-scheduler as batch scheduler (#2136)

* Support coscheduling with kube-scheduler plugins

Signed-off-by: Yi Chen <[email protected]>

* Add example for using kube-schulder coscheduling

Signed-off-by: Yi Chen <[email protected]>

---------

Signed-off-by: Yi Chen <[email protected]>
(cherry picked from commit e8d3de9)
Signed-off-by: Yi Chen <[email protected]>

* Run e2e tests on Kind (#2148)

Signed-off-by: Jacob Salway <[email protected]>
(cherry picked from commit c810ece)
Signed-off-by: Yi Chen <[email protected]>

* Set schedulerName to Yunikorn (#2153)

Signed-off-by: Jacob Salway <[email protected]>
(cherry picked from commit 62b4ca6)
Signed-off-by: Yi Chen <[email protected]>

* Create role and rolebinding for controller/webhook in every spark job namespace if not watching all namespaces (#2129)

watching all namespaces

Signed-off-by: Yi Chen <[email protected]>
(cherry picked from commit 592b649)
Signed-off-by: Yi Chen <[email protected]>

* Fix: e2e test failes due to webhook not ready (#2149)

Signed-off-by: Yi Chen <[email protected]>
(cherry picked from commit dee91ba)
Signed-off-by: Yi Chen <[email protected]>

* Upgrade to Go 1.23.1 (#2155)

Signed-off-by: Jacob Salway <[email protected]>
(cherry picked from commit 10fcb8e)
Signed-off-by: Yi Chen <[email protected]>

* Upgrade to Spark 3.5.2 (#2154)

Signed-off-by: Jacob Salway <[email protected]>
(cherry picked from commit e1b7a27)
Signed-off-by: Yi Chen <[email protected]>

* Bump sigs.k8s.io/scheduler-plugins from 0.29.7 to 0.29.8 (#2159)

Bumps [sigs.k8s.io/scheduler-plugins](https://github.com/kubernetes-sigs/scheduler-plugins) from 0.29.7 to 0.29.8.
- [Release notes](https://github.com/kubernetes-sigs/scheduler-plugins/releases)
- [Changelog](https://github.com/kubernetes-sigs/scheduler-plugins/blob/master/RELEASE.md)
- [Commits](kubernetes-sigs/scheduler-plugins@v0.29.7...v0.29.8)

---
updated-dependencies:
- dependency-name: sigs.k8s.io/scheduler-plugins
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
(cherry picked from commit 95d202e)
Signed-off-by: Yi Chen <[email protected]>

* feat: support driver and executor pod use different priority (#2146)

* feat: support driver and executor pod use different priority

Signed-off-by: Kevin Wu <[email protected]>

* feat: if *app.Spec.Driver.PriorityClassName and *app.Spec.Executor.PriorityClassName specifically defined, then can precedence over spec.batchSchedulerOptions.priorityClassName

Signed-off-by: Kevin Wu <[email protected]>

* feat: merge the logic of setPodPriorityClassName into addPriorityClassName

Signed-off-by: Kevin Wu <[email protected]>

* feat: support driver and executor pod use different priority

Signed-off-by: Kevin Wu <[email protected]>
Signed-off-by: Kevin.Wu <[email protected]>

* feat: if *app.Spec.Driver.PriorityClassName and *app.Spec.Executor.PriorityClassName specifically defined, then can precedence over spec.batchSchedulerOptions.priorityClassName

Signed-off-by: Kevin Wu <[email protected]>
Signed-off-by: Kevin.Wu <[email protected]>

* feat: merge the logic of setPodPriorityClassName into addPriorityClassName

Signed-off-by: Kevin Wu <[email protected]>
Signed-off-by: Kevin.Wu <[email protected]>

* feat: add adjust pointer if is nil

Signed-off-by: Kevin.Wu <[email protected]>

* feat: remove spec.batchSchedulerOptions.priorityClassName define , split driver and executor pod priorityClass

Signed-off-by: Kevin Wu <[email protected]>

* feat: remove spec.batchSchedulerOptions.priorityClassName define , split driver and executor pod priorityClass

Signed-off-by: Kevin Wu <[email protected]>

* feat: Optimize code to avoid null pointer exceptions

Signed-off-by: Kevin.Wu <[email protected]>

* fix: remove backup crd files

Signed-off-by: Kevin.Wu <[email protected]>

* fix: remove BatchSchedulerOptions.PriorityClassName test code

Signed-off-by: Kevin Wu <[email protected]>

* fix: add driver and executor pod priorityClassName test code

Signed-off-by: Kevin Wu <[email protected]>

---------

Signed-off-by: Kevin Wu <[email protected]>
Signed-off-by: Kevin.Wu <[email protected]>
Co-authored-by: Kevin Wu <[email protected]>
(cherry picked from commit 6ae1b2f)
Signed-off-by: Yi Chen <[email protected]>

* Bump gocloud.dev from 0.37.0 to 0.39.0 (#2160)

Bumps [gocloud.dev](https://github.com/google/go-cloud) from 0.37.0 to 0.39.0.
- [Release notes](https://github.com/google/go-cloud/releases)
- [Commits](google/go-cloud@v0.37.0...v0.39.0)

---
updated-dependencies:
- dependency-name: gocloud.dev
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
(cherry picked from commit e58023b)
Signed-off-by: Yi Chen <[email protected]>

* Update e2e tests (#2161)

* Add sleep buffer to ensture the webhooks are ready before running the e2e tests

Signed-off-by: Yi Chen <[email protected]>

* Remove duplicate operator image build tasks

Signed-off-by: Yi Chen <[email protected]>

* Update e2e tests

Signed-off-by: Yi Chen <[email protected]>

* Update examples

Signed-off-by: Yi Chen <[email protected]>

---------

Signed-off-by: Yi Chen <[email protected]>
(cherry picked from commit e6a7805)
Signed-off-by: Yi Chen <[email protected]>

* fix: webhook not working when settings spark job namespaces to empty (#2163)

Signed-off-by: Yi Chen <[email protected]>
(cherry picked from commit 7785107)
Signed-off-by: Yi Chen <[email protected]>

* fix: The logger had an odd number of arguments, making it panic (#2166)

Signed-off-by: tcassaert <[email protected]>
(cherry picked from commit eb48b34)
Signed-off-by: Yi Chen <[email protected]>

* Upgrade to Spark 3.5.2(#2012) (#2157)

* Upgrade to Spark 3.5.2

Signed-off-by: HyukSangCho <[email protected]>

* Upgrade to Spark 3.5.2

Signed-off-by: HyukSangCho <[email protected]>

* Upgrade to Spark 3.5.2

Signed-off-by: HyukSangCho <[email protected]>

* Upgrade to Spark 3.5.2

Signed-off-by: HyukSangCho <[email protected]>

---------

Signed-off-by: HyukSangCho <[email protected]>
(cherry picked from commit 9f0c08a)
Signed-off-by: Yi Chen <[email protected]>

* Feature: Add pprof endpoint (#2164)

* add pprof support to the operator Controller Manager

Signed-off-by: ImpSy <[email protected]>

* add pprof support to helm chart

Signed-off-by: ImpSy <[email protected]>

---------

Signed-off-by: ImpSy <[email protected]>
(cherry picked from commit 75b9266)
Signed-off-by: Yi Chen <[email protected]>

* fix the make kind-delete-custer to avoid accidental kubeconfig deletion (#2172)

Signed-off-by: ImpSy <[email protected]>
(cherry picked from commit cbfefd5)
Signed-off-by: Yi Chen <[email protected]>

* Bump github.com/aws/aws-sdk-go-v2/config from 1.27.27 to 1.27.33 (#2174)

Bumps [github.com/aws/aws-sdk-go-v2/config](https://github.com/aws/aws-sdk-go-v2) from 1.27.27 to 1.27.33.
- [Release notes](https://github.com/aws/aws-sdk-go-v2/releases)
- [Commits](aws/aws-sdk-go-v2@config/v1.27.27...config/v1.27.33)

---
updated-dependencies:
- dependency-name: github.com/aws/aws-sdk-go-v2/config
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
(cherry picked from commit b818332)
Signed-off-by: Yi Chen <[email protected]>

* Bump helm.sh/helm/v3 from 3.15.3 to 3.16.1 (#2173)

Bumps [helm.sh/helm/v3](https://github.com/helm/helm) from 3.15.3 to 3.16.1.
- [Release notes](https://github.com/helm/helm/releases)
- [Commits](helm/helm@v3.15.3...v3.16.1)

---
updated-dependencies:
- dependency-name: helm.sh/helm/v3
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
(cherry picked from commit f3f80d4)
Signed-off-by: Yi Chen <[email protected]>

* Add specific error in log line when failed to create web UI service (#2170)

* Add specific error in log line when failed to create web UI service

Signed-off-by: tcassaert <[email protected]>

* Update log to reflect correct resource that could not be created

Co-authored-by: Yi Chen <[email protected]>
Signed-off-by: tcassaert <[email protected]>

---------

Signed-off-by: tcassaert <[email protected]>
Signed-off-by: tcassaert <[email protected]>
Co-authored-by: Yi Chen <[email protected]>
(cherry picked from commit ed3226e)
Signed-off-by: Yi Chen <[email protected]>

* Account for spark.executor.pyspark.memory in Yunikorn gang scheduling (#2178)

Signed-off-by: Jacob Salway <[email protected]>
(cherry picked from commit a2f71c6)
Signed-off-by: Yi Chen <[email protected]>

* Fix: spark application does not respect time to live seconds (#2165)

* Add time to live seconds example spark application

Signed-off-by: Yi Chen <[email protected]>

* fix: spark application does not respect time to live seconds

Signed-off-by: Yi Chen <[email protected]>

---------

Signed-off-by: Yi Chen <[email protected]>
(cherry picked from commit c855ee4)
Signed-off-by: Yi Chen <[email protected]>

* Update release workflow and docs (#2121)

Signed-off-by: Yi Chen <[email protected]>
(cherry picked from commit bca6aa8)
Signed-off-by: Yi Chen <[email protected]>

---------

Signed-off-by: Jacob Salway <[email protected]>
Signed-off-by: Yi Chen <[email protected]>
Signed-off-by: pengfei4.li <[email protected]>
Signed-off-by: dependabot[bot] <[email protected]>
Signed-off-by: Kevin Wu <[email protected]>
Signed-off-by: Kevin.Wu <[email protected]>
Signed-off-by: tcassaert <[email protected]>
Signed-off-by: HyukSangCho <[email protected]>
Signed-off-by: ImpSy <[email protected]>
Signed-off-by: tcassaert <[email protected]>
Co-authored-by: Jacob Salway <[email protected]>
Co-authored-by: Neo <[email protected]>
Co-authored-by: pengfei4.li <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Kevinz <[email protected]>
Co-authored-by: Kevin Wu <[email protected]>
Co-authored-by: tcassaert <[email protected]>
Co-authored-by: ha2hi <[email protected]>
Co-authored-by: Sébastien Maintrot <[email protected]>
jbhalodia-slack pushed a commit to jbhalodia-slack/spark-operator that referenced this pull request Oct 4, 2024
* Support gang scheduling with Yunikorn (kubeflow#2107)

* Add Yunikorn scheduler and example

Signed-off-by: Jacob Salway <[email protected]>

* Add test cases

Signed-off-by: Jacob Salway <[email protected]>

* Add code comments

Signed-off-by: Jacob Salway <[email protected]>

* Add license comment

Signed-off-by: Jacob Salway <[email protected]>

* Inline mergeNodeSelector

Signed-off-by: Jacob Salway <[email protected]>

* Fix initial number implementation

Signed-off-by: Jacob Salway <[email protected]>

---------

Signed-off-by: Jacob Salway <[email protected]>
(cherry picked from commit 8fcda12)
Signed-off-by: Yi Chen <[email protected]>

* Update Makefile for building sparkctl (kubeflow#2119)

Signed-off-by: Yi Chen <[email protected]>
(cherry picked from commit 4bc6e89)
Signed-off-by: Yi Chen <[email protected]>

* fix: Add default values for namespaces to match usage descriptions  (kubeflow#2128)

* fix: Add default values for namespaces to match usage descriptions

Signed-off-by: pengfei4.li <[email protected]>

* fix: remove incorrect cache settings

Signed-off-by: pengfei4.li <[email protected]>

---------

Signed-off-by: pengfei4.li <[email protected]>
Co-authored-by: pengfei4.li <[email protected]>
(cherry picked from commit 52f818d)
Signed-off-by: Yi Chen <[email protected]>

* Fix: Spark role binding did not render properly when setting spark service account name (kubeflow#2135)

Signed-off-by: Yi Chen <[email protected]>
(cherry picked from commit a1a38ea)
Signed-off-by: Yi Chen <[email protected]>

* Reintroduce option webhook.enable (kubeflow#2142)

Signed-off-by: Yi Chen <[email protected]>
(cherry picked from commit 9e88049)
Signed-off-by: Yi Chen <[email protected]>

* Add default batch scheduler argument (kubeflow#2143)

* Add default batch scheduler argument

Signed-off-by: Jacob Salway <[email protected]>

* Add helm unit test

Signed-off-by: Jacob Salway <[email protected]>

---------

Signed-off-by: Jacob Salway <[email protected]>
(cherry picked from commit 9cc1c02)
Signed-off-by: Yi Chen <[email protected]>

* fix: unable to set controller/webhook replicas to zero (kubeflow#2147)

Signed-off-by: Yi Chen <[email protected]>
(cherry picked from commit 1afa72e)
Signed-off-by: Yi Chen <[email protected]>

* Adding support for setting spark job namespaces to all namespaces (kubeflow#2123)

Signed-off-by: Yi Chen <[email protected]>
(cherry picked from commit c93b0ec)
Signed-off-by: Yi Chen <[email protected]>

* Support extended kube-scheduler as batch scheduler (kubeflow#2136)

* Support coscheduling with kube-scheduler plugins

Signed-off-by: Yi Chen <[email protected]>

* Add example for using kube-schulder coscheduling

Signed-off-by: Yi Chen <[email protected]>

---------

Signed-off-by: Yi Chen <[email protected]>
(cherry picked from commit e8d3de9)
Signed-off-by: Yi Chen <[email protected]>

* Run e2e tests on Kind (kubeflow#2148)

Signed-off-by: Jacob Salway <[email protected]>
(cherry picked from commit c810ece)
Signed-off-by: Yi Chen <[email protected]>

* Set schedulerName to Yunikorn (kubeflow#2153)

Signed-off-by: Jacob Salway <[email protected]>
(cherry picked from commit 62b4ca6)
Signed-off-by: Yi Chen <[email protected]>

* Create role and rolebinding for controller/webhook in every spark job namespace if not watching all namespaces (kubeflow#2129)

watching all namespaces

Signed-off-by: Yi Chen <[email protected]>
(cherry picked from commit 592b649)
Signed-off-by: Yi Chen <[email protected]>

* Fix: e2e test failes due to webhook not ready (kubeflow#2149)

Signed-off-by: Yi Chen <[email protected]>
(cherry picked from commit dee91ba)
Signed-off-by: Yi Chen <[email protected]>

* Upgrade to Go 1.23.1 (kubeflow#2155)

Signed-off-by: Jacob Salway <[email protected]>
(cherry picked from commit 10fcb8e)
Signed-off-by: Yi Chen <[email protected]>

* Upgrade to Spark 3.5.2 (kubeflow#2154)

Signed-off-by: Jacob Salway <[email protected]>
(cherry picked from commit e1b7a27)
Signed-off-by: Yi Chen <[email protected]>

* Bump sigs.k8s.io/scheduler-plugins from 0.29.7 to 0.29.8 (kubeflow#2159)

Bumps [sigs.k8s.io/scheduler-plugins](https://github.com/kubernetes-sigs/scheduler-plugins) from 0.29.7 to 0.29.8.
- [Release notes](https://github.com/kubernetes-sigs/scheduler-plugins/releases)
- [Changelog](https://github.com/kubernetes-sigs/scheduler-plugins/blob/master/RELEASE.md)
- [Commits](kubernetes-sigs/scheduler-plugins@v0.29.7...v0.29.8)

---
updated-dependencies:
- dependency-name: sigs.k8s.io/scheduler-plugins
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
(cherry picked from commit 95d202e)
Signed-off-by: Yi Chen <[email protected]>

* feat: support driver and executor pod use different priority (kubeflow#2146)

* feat: support driver and executor pod use different priority

Signed-off-by: Kevin Wu <[email protected]>

* feat: if *app.Spec.Driver.PriorityClassName and *app.Spec.Executor.PriorityClassName specifically defined, then can precedence over spec.batchSchedulerOptions.priorityClassName

Signed-off-by: Kevin Wu <[email protected]>

* feat: merge the logic of setPodPriorityClassName into addPriorityClassName

Signed-off-by: Kevin Wu <[email protected]>

* feat: support driver and executor pod use different priority

Signed-off-by: Kevin Wu <[email protected]>
Signed-off-by: Kevin.Wu <[email protected]>

* feat: if *app.Spec.Driver.PriorityClassName and *app.Spec.Executor.PriorityClassName specifically defined, then can precedence over spec.batchSchedulerOptions.priorityClassName

Signed-off-by: Kevin Wu <[email protected]>
Signed-off-by: Kevin.Wu <[email protected]>

* feat: merge the logic of setPodPriorityClassName into addPriorityClassName

Signed-off-by: Kevin Wu <[email protected]>
Signed-off-by: Kevin.Wu <[email protected]>

* feat: add adjust pointer if is nil

Signed-off-by: Kevin.Wu <[email protected]>

* feat: remove spec.batchSchedulerOptions.priorityClassName define , split driver and executor pod priorityClass

Signed-off-by: Kevin Wu <[email protected]>

* feat: remove spec.batchSchedulerOptions.priorityClassName define , split driver and executor pod priorityClass

Signed-off-by: Kevin Wu <[email protected]>

* feat: Optimize code to avoid null pointer exceptions

Signed-off-by: Kevin.Wu <[email protected]>

* fix: remove backup crd files

Signed-off-by: Kevin.Wu <[email protected]>

* fix: remove BatchSchedulerOptions.PriorityClassName test code

Signed-off-by: Kevin Wu <[email protected]>

* fix: add driver and executor pod priorityClassName test code

Signed-off-by: Kevin Wu <[email protected]>

---------

Signed-off-by: Kevin Wu <[email protected]>
Signed-off-by: Kevin.Wu <[email protected]>
Co-authored-by: Kevin Wu <[email protected]>
(cherry picked from commit 6ae1b2f)
Signed-off-by: Yi Chen <[email protected]>

* Bump gocloud.dev from 0.37.0 to 0.39.0 (kubeflow#2160)

Bumps [gocloud.dev](https://github.com/google/go-cloud) from 0.37.0 to 0.39.0.
- [Release notes](https://github.com/google/go-cloud/releases)
- [Commits](google/go-cloud@v0.37.0...v0.39.0)

---
updated-dependencies:
- dependency-name: gocloud.dev
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
(cherry picked from commit e58023b)
Signed-off-by: Yi Chen <[email protected]>

* Update e2e tests (kubeflow#2161)

* Add sleep buffer to ensture the webhooks are ready before running the e2e tests

Signed-off-by: Yi Chen <[email protected]>

* Remove duplicate operator image build tasks

Signed-off-by: Yi Chen <[email protected]>

* Update e2e tests

Signed-off-by: Yi Chen <[email protected]>

* Update examples

Signed-off-by: Yi Chen <[email protected]>

---------

Signed-off-by: Yi Chen <[email protected]>
(cherry picked from commit e6a7805)
Signed-off-by: Yi Chen <[email protected]>

* fix: webhook not working when settings spark job namespaces to empty (kubeflow#2163)

Signed-off-by: Yi Chen <[email protected]>
(cherry picked from commit 7785107)
Signed-off-by: Yi Chen <[email protected]>

* fix: The logger had an odd number of arguments, making it panic (kubeflow#2166)

Signed-off-by: tcassaert <[email protected]>
(cherry picked from commit eb48b34)
Signed-off-by: Yi Chen <[email protected]>

* Upgrade to Spark 3.5.2(kubeflow#2012) (kubeflow#2157)

* Upgrade to Spark 3.5.2

Signed-off-by: HyukSangCho <[email protected]>

* Upgrade to Spark 3.5.2

Signed-off-by: HyukSangCho <[email protected]>

* Upgrade to Spark 3.5.2

Signed-off-by: HyukSangCho <[email protected]>

* Upgrade to Spark 3.5.2

Signed-off-by: HyukSangCho <[email protected]>

---------

Signed-off-by: HyukSangCho <[email protected]>
(cherry picked from commit 9f0c08a)
Signed-off-by: Yi Chen <[email protected]>

* Feature: Add pprof endpoint (kubeflow#2164)

* add pprof support to the operator Controller Manager

Signed-off-by: ImpSy <[email protected]>

* add pprof support to helm chart

Signed-off-by: ImpSy <[email protected]>

---------

Signed-off-by: ImpSy <[email protected]>
(cherry picked from commit 75b9266)
Signed-off-by: Yi Chen <[email protected]>

* fix the make kind-delete-custer to avoid accidental kubeconfig deletion (kubeflow#2172)

Signed-off-by: ImpSy <[email protected]>
(cherry picked from commit cbfefd5)
Signed-off-by: Yi Chen <[email protected]>

* Bump github.com/aws/aws-sdk-go-v2/config from 1.27.27 to 1.27.33 (kubeflow#2174)

Bumps [github.com/aws/aws-sdk-go-v2/config](https://github.com/aws/aws-sdk-go-v2) from 1.27.27 to 1.27.33.
- [Release notes](https://github.com/aws/aws-sdk-go-v2/releases)
- [Commits](aws/aws-sdk-go-v2@config/v1.27.27...config/v1.27.33)

---
updated-dependencies:
- dependency-name: github.com/aws/aws-sdk-go-v2/config
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
(cherry picked from commit b818332)
Signed-off-by: Yi Chen <[email protected]>

* Bump helm.sh/helm/v3 from 3.15.3 to 3.16.1 (kubeflow#2173)

Bumps [helm.sh/helm/v3](https://github.com/helm/helm) from 3.15.3 to 3.16.1.
- [Release notes](https://github.com/helm/helm/releases)
- [Commits](helm/helm@v3.15.3...v3.16.1)

---
updated-dependencies:
- dependency-name: helm.sh/helm/v3
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
(cherry picked from commit f3f80d4)
Signed-off-by: Yi Chen <[email protected]>

* Add specific error in log line when failed to create web UI service (kubeflow#2170)

* Add specific error in log line when failed to create web UI service

Signed-off-by: tcassaert <[email protected]>

* Update log to reflect correct resource that could not be created

Co-authored-by: Yi Chen <[email protected]>
Signed-off-by: tcassaert <[email protected]>

---------

Signed-off-by: tcassaert <[email protected]>
Signed-off-by: tcassaert <[email protected]>
Co-authored-by: Yi Chen <[email protected]>
(cherry picked from commit ed3226e)
Signed-off-by: Yi Chen <[email protected]>

* Account for spark.executor.pyspark.memory in Yunikorn gang scheduling (kubeflow#2178)

Signed-off-by: Jacob Salway <[email protected]>
(cherry picked from commit a2f71c6)
Signed-off-by: Yi Chen <[email protected]>

* Fix: spark application does not respect time to live seconds (kubeflow#2165)

* Add time to live seconds example spark application

Signed-off-by: Yi Chen <[email protected]>

* fix: spark application does not respect time to live seconds

Signed-off-by: Yi Chen <[email protected]>

---------

Signed-off-by: Yi Chen <[email protected]>
(cherry picked from commit c855ee4)
Signed-off-by: Yi Chen <[email protected]>

* Update release workflow and docs (kubeflow#2121)

Signed-off-by: Yi Chen <[email protected]>
(cherry picked from commit bca6aa8)
Signed-off-by: Yi Chen <[email protected]>

---------

Signed-off-by: Jacob Salway <[email protected]>
Signed-off-by: Yi Chen <[email protected]>
Signed-off-by: pengfei4.li <[email protected]>
Signed-off-by: dependabot[bot] <[email protected]>
Signed-off-by: Kevin Wu <[email protected]>
Signed-off-by: Kevin.Wu <[email protected]>
Signed-off-by: tcassaert <[email protected]>
Signed-off-by: HyukSangCho <[email protected]>
Signed-off-by: ImpSy <[email protected]>
Signed-off-by: tcassaert <[email protected]>
Co-authored-by: Jacob Salway <[email protected]>
Co-authored-by: Neo <[email protected]>
Co-authored-by: pengfei4.li <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Kevinz <[email protected]>
Co-authored-by: Kevin Wu <[email protected]>
Co-authored-by: tcassaert <[email protected]>
Co-authored-by: ha2hi <[email protected]>
Co-authored-by: Sébastien Maintrot <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

The e2e integration tests are currently broken
3 participants