Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update Daemonset API for GPU driver installing #538

Merged
merged 1 commit into from
Dec 6, 2019

Conversation

jinchihe
Copy link
Member

@jinchihe jinchihe commented Dec 6, 2019

Fixes: #537

Since the API version is apps/v1, We should use kubernetes.client.AppsV1Api, instead of kubernetes.client.ExtensionsV1beta1Api

https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/stable/nvidia-driver-installer/cos/daemonset-preloaded.yaml

apiVersion: apps/v1
kind: DaemonSet
metadata:

This change is Reviewable

@jinchihe
Copy link
Member Author

jinchihe commented Dec 6, 2019

/hold
I'm going to double confirm my changes in tfjobs ci tests.

@jinchihe
Copy link
Member Author

jinchihe commented Dec 6, 2019

Seems the stable version defination file still has problem:

 File "/usr/local/lib/python2.7/dist-packages/kubernetes/client/api_client.py", line 364, in request
   body=body)
 File "/usr/local/lib/python2.7/dist-packages/kubernetes/client/rest.py", line 266, in POST
   body=body)
 File "/usr/local/lib/python2.7/dist-packages/kubernetes/client/rest.py", line 222, in request
   raise ApiException(http_resp=r)
kubernetes.client.rest.ApiException: (422)
Reason: Unprocessable Entity
HTTP response headers: HTTPHeaderDict({'Date': 'Fri, 06 Dec 2019 16:35:06 GMT', 'Audit-Id': '12e54e76-7d7f-4d54-9e44-c518e6c9e4fc', 'Cont
ent-Length': '684', 'Content-Type': 'application/json'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"DaemonSet.apps \"nvidia-driver-install
er\" is invalid: spec.template.metadata.labels: Invalid value: map[string]string{\"k8s-app\":\"nvidia-driver-installer\", \"name\":\"nvid
ia-driver-installer\"}: `selector` does not match template `labels`","reason":"Invalid","details":{"name":"nvidia-driver-installer","grou
p":"apps","kind":"DaemonSet","causes":[{"reason":"FieldValueInvalid","message":"Invalid value: map[string]string{\"k8s-app\":\"nvidia-dri
ver-installer\", \"name\":\"nvidia-driver-installer\"}: `selector` does not match template `labels`","field":"spec.template.metadata.labe
ls"}]},"code":422}

@jinchihe
Copy link
Member Author

jinchihe commented Dec 6, 2019

The above problem has been fixed and verified in the tfjob ci tests, that works fine now.
/hold cancel

/cc @jlewi @zhenghuiwang
Could you please let a take with high priority? that blocks the tfjobs tests. Thanks a lot!

@zhenghuiwang
Copy link
Contributor

/lgtm

@zhenghuiwang
Copy link
Contributor

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: zhenghuiwang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Failed set up cluster while installing GPU Drivers
3 participants