Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to get resource list for visibility.kueue.x-k8s.io/v1alpha1 #1519

Closed
tenzen-y opened this issue Dec 26, 2023 · 11 comments · Fixed by #1746
Closed

Failed to get resource list for visibility.kueue.x-k8s.io/v1alpha1 #1519

tenzen-y opened this issue Dec 26, 2023 · 11 comments · Fixed by #1746
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@tenzen-y
Copy link
Member

What happened:
Once we run kubectl get resourceflavor, we received the following error:

$ kubectl get resourceflavor
E1215 18:17:09.715081   24954 memcache.go:255] couldn't get resource list for visibility.kueue.x-k8s.io/v1alpha1:
 the server is currently unable to handle the request
E1215 18:17:09.723755   24954 memcache.go:106] couldn't get resource list for visibility.kueue.x-k8s.io/v1alpha1:
 the server is currently unable to handle the request
No resources found

What you expected to happen:
No error happens.

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:
We found this bug in #1459

Also, we haven't faced this error in K8s v1.27, v1.28, and v1.29.

Environment: KinD

  • Kubernetes version (use kubectl version): v1.26.3
  • Kueue version (use git describe --tags --dirty --always): main branch
  • Cloud provider or hardware configuration:
  • OS (e.g: cat /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Others:
@tenzen-y tenzen-y added the kind/bug Categorizes issue or PR as related to a bug. label Dec 26, 2023
@tenzen-y
Copy link
Member Author

cc: @mimowo @B1F030

@jrleslie
Copy link

jrleslie commented Feb 5, 2024

@tenzen-y - are there any plans to address this one? Can confirm we're seeing the same error on kubernetes version 1.26 with latest helm chart deployment via the main branch.

@B1F030
Copy link
Member

B1F030 commented Feb 6, 2024

If we use the default config to install kueue via helm, we can see logs in kube-apiserver like this:

E0206 02:48:46.207798       1 available_controller.go:527] v1alpha1.visibility.kueue.x-k8s.io failed with:
failing or missing response from https://10.233.23.237:443/apis/visibility.kueue.x-k8s.io/v1alpha1:
Get "https://10.233.23.237:443/apis/visibility.kueue.x-k8s.io/v1alpha1":
dial tcp 10.233.23.237:443: connect: connection refused

@jrleslie As I tried in my environment, maybe you can enable the feature VisibilityOnDemand in kueue/charts/kueue/values.yaml first, so that the error will not happen.

controllerManager:
  featureGates:
    - name: VisibilityOnDemand
      enabled: true

And I think this error may be caused by this feature too. If we want to catch the bug, this is where we can start. @tenzen-y

@kerthcet
Copy link
Contributor

kerthcet commented Feb 6, 2024

I think this is a regression on kubernetes/kubernetes#115978, but cherry-picked to 1.26. Can you version v1.26.5?

@kerthcet
Copy link
Contributor

kerthcet commented Feb 6, 2024

Let me take back the words. New clues found.

@alculquicondor
Copy link
Contributor

The visibility server is disabled by default, so maybe that has something to do with it?

@trasc can you take a look?

@alculquicondor
Copy link
Contributor

In helm, we could make the installation of the API optional https://github.com/kubernetes-sigs/kueue/blob/main/config/components/visibility/apiservice.yaml

But in kustomize... I guess we should comment out this line

- ../components/visibility

@alculquicondor
Copy link
Contributor

/assign @trasc

@trasc
Copy link
Contributor

trasc commented Feb 15, 2024

We can add a dedicated overlay like we have for prometheus, and include it in alpha-enabled as well. For help maybe add enableVisibility value.

@alculquicondor
Copy link
Contributor

Those sound like good plans

@minierm
Copy link

minierm commented Feb 23, 2024

FYI and help others until the issue is fixed:
I enabled the feature by adding a line after 11182 in manifests.yaml and no errors anymore:

  containers:
  - args:
    - --config=/controller_manager_config.yaml
    - --zap-log-level=2

+ - --feature-gates=VisibilityOnDemand=true
command:
- /manager

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants