Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

shareProcessNamespace not working with TFJob #902

Closed
dpaks opened this issue Dec 20, 2018 · 5 comments
Closed

shareProcessNamespace not working with TFJob #902

dpaks opened this issue Dec 20, 2018 · 5 comments

Comments

@dpaks
Copy link

dpaks commented Dec 20, 2018

Using kubeflow v0.3.4 and kube 1.12

shareProcessNamespace works when used with Pod spec.

apiVersion: v1
kind: Pod
metadata:
name: "dpak-shared"
spec:
shareProcessNamespace: true
containers:
- image: "ubuntu:16.04"
name: ubuntu
imagePullPolicy: IfNotPresent
command: ["sleep", "infinity"]
securityContext:
capabilities:
add:
- SYS_PTRACE
- image: "datascience-tf-cpu:v1"
name: tensorflow
imagePullPolicy: IfNotPresent
command: ["sleep", "infinity"]

shareProcessNamespace doesn't work when used with TFJob spec.

apiVersion: "kubeflow.org/v1alpha2"
kind: TFJob
metadata:
name: "dpak-shared-job"
spec:
cleanPodPolicy: ALL
tfReplicaSpecs:
Master:
replicas: 1
template:
spec:
shareProcessNamespace: true
containers:
- image: "ubuntu:16.04"
name: ubuntu
imagePullPolicy: IfNotPresent
command: ["sleep", "infinity"]
securityContext:
capabilities:
add:
- SYS_PTRACE
- image: "datascience-tf-cpu:v1"
name: tensorflow
imagePullPolicy: IfNotPresent
command: ["sleep", "infinity"]

I tried both on the same Ubuntu 18.04 v4.15 machine.

@gaocegege
Copy link
Member

I think we need to reproduce it first. We use the template directly to create the pod: https://github.com/kubeflow/tf-operator/blob/master/pkg/controller.v1beta1/tensorflow/pod.go#L178. It should work well with all fields in the template.

@jlewi
Copy link
Contributor

jlewi commented Jan 4, 2019

@dpaks Can you provide the full pod spec for the actual pods created by TFJob e.g

kubectl get pods -o yaml ${MASTER pod

As @gaocegege says tf-operator should be passing the fields through directly to the pod template spec. If its not then its a bug in tf-operator but if it is then something outside tf-operator is going on.

@gaocegege
Copy link
Member

I am closing the issue since it is stale. But feel free to leave comments if you still have problems.

@zionwu
Copy link
Contributor

zionwu commented Aug 10, 2020

I have the same issue when using tf-operator v0.3.0.
I looked into the code and found it is because tf-operator is using v1.9 for k8s.io/api and shareProcessNamespace is not yet supported at this release: https://github.com/kubeflow/tf-operator/blob/v0.3.0/vendor/k8s.io/api/core/v1/types.go#L2716

@issue-label-bot
Copy link

Issue-Label Bot is automatically applying the labels:

Label Probability
area/tfjob 0.85

Please mark this comment with 👍 or 👎 to give our bot feedback!
Links: app homepage, dashboard and code for this bot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants