Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coordinator field not set when jobset was installed with version before coordinator support then upgraded #701

Closed
avrittrohwer opened this issue Nov 7, 2024 · 4 comments · May be fixed by #702
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@avrittrohwer
Copy link

avrittrohwer commented Nov 7, 2024

What happened:

In a cluster with jobset API installed with a version before coordinator support, then upgraded to a version with coordinator support, when submitting a jobset yaml with coordinator field set, the coordinator field is nil in the resulting stored jobset

What you expected to happen:

Coordinator field is not nil

How to reproduce it (as minimally and precisely as possible):

This is reproducible in a kind cluster with the follwing jobset:

apiVersion: jobset.x-k8s.io/v1alpha2
kind: JobSet
metadata:
  name: js
spec:
  coordinator:
    replicatedJob: leader
    jobIndex: 0
    podIndex: 0
  replicatedJobs:
  - name: leader
    replicas: 1
    template:
      spec:
        parallelism: 1
        completions: 1
        template:
          spec:
            containers:
            - name: progrsm
              image: gcr.io/k8s-staging-perf-tests/sleep:v0.1.0
              args:
              - 10m
  1. kind create cluster
  2. kubectl apply --server-side -f https://github.com/kubernetes-sigs/jobset/releases/download/v0.4.0/manifests.yaml
  3. Expect to fail: kubectl apply -f js.yaml: Error from server (BadRequest): error when creating "js.yaml": JobSet in version "v1alpha2" cannot be handled as a JobSet: strict decoding error: unknown field "spec.coordinator"
  4. Upgrade to version with coordinator: kubectl apply --server-side -f https://github.com/kubernetes-sigs/jobset/releases/download/v0.7.0/manifests.yaml
  5. kubectl apply -f js.yaml
  6. Coordinator field is nil: kubectl get jobsets js -o yaml | grep coordinator

Anything else we need to know?:

  • If I change the field in the spec to coordinatorz, apply fails: strict decoding error: unknown field "spec.coordinatoz"
  • If I change the coordinator.replicatedJob field to def-not-exists, the apply succeeds. So I think the coordinator field is getting set to nil most likely in the CRD conversion webhook and before the validation webhook

Environment:

  • Kubernetes version (use kubectl version):
Client Version: v1.31.1
Kustomize Version: v5.4.2
Server Version: v1.31.1
  • JobSet version (use git describe --tags --dirty --always): v0.4.0 and v0.7.0
  • Cloud provider or hardware configuration: kind
  • Install tools: kind
  • Others:
@avrittrohwer avrittrohwer added the kind/bug Categorizes issue or PR as related to a bug. label Nov 7, 2024
@kannon92
Copy link
Contributor

kannon92 commented Nov 7, 2024

cc @ahg-g.

@avrittrohwer
Copy link
Author

After some more testing, the issue is only reproducible in the kind cluster when I submit the jobset before the v0.4.0 manager pod is deleted by the deployment which makes sense why it would behave that way. I originally saw this in a GKE cluster and used kind as a minimal repro, I will keep investigating in the GKE cluster

@avrittrohwer
Copy link
Author

The issue is that kueue was also installed in my cluster

Kueue v0.9.0 is the first version with a copy of jobset API with coordinator field: https://github.com/kubernetes-sigs/kueue/blob/release-0.9/vendor/sigs.k8s.io/jobset/api/jobset/v1alpha2/jobset_types.go

v0.8.0 does not have the field: https://github.com/kubernetes-sigs/kueue/blob/release-0.8/vendor/sigs.k8s.io/jobset/api/jobset/v1alpha2/jobset_types.go

@ahg-g
Copy link
Contributor

ahg-g commented Nov 8, 2024

Do you want to open an issue in the Kueue repo to update their JobSet definition?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants