-
Notifications
You must be signed in to change notification settings - Fork 220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow running MPI applications as non-root #383
Conversation
/hold |
974007f
to
fb30471
Compare
Yes it should work |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good so far, but it's a fairly large change and it's easy to miss things. Could you confirm that it was tested end-to-end in addition to unit tests?
examples/pi/Dockerfile
Outdated
RUN useradd -m mpiuser | ||
WORKDIR /home/mpiuser | ||
COPY --chown=mpiuser sshd_config .sshd_config | ||
RUN cat /etc/ssh/ssh_config | grep -Ev 'StrictHostKeyChecking' > /etc/ssh/ssh_config.new && \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Maybe something like this would be more explicit and simpler?
sed -i 's/( )StrictHostKeyChecking ./\1StrictHostKeyChecking no/g' /etc/ssh/ssh_config.new
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simplified it a bit differently, but thanks for the suggestion :)
@@ -74,21 +74,19 @@ const ( | |||
sshAuthVolume = "ssh-auth" | |||
sshAuthMountPath = "/mnt/ssh" | |||
sshHomeVolume = "ssh-home" | |||
// TODO(alculquicondor): Make home directory configurable through the API. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a thing to consider: would it make sense to make the whole home directory configurable as opposed to just .ssh directory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about that. But someone creating a container image might want not to use $HOME/.ssh
but a completely different location. And, at the moment, I can't think of other uses of the home folder from the controller perspective.
RestartPolicy: corev1.RestartPolicyNever, | ||
Containers: []corev1.Container{ | ||
{ | ||
Env: func() []corev1.EnvVar { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can all of this be replaced by "append(ompiEnvVars, nvidiaDisableEnvVars...)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I considered it, but not sure if it will have consequences on the allocation of ompiEnvVars
another alternative would be:
append(append([]corev1.EnvVar(nil), ompiEnvVars...), nvidiaDisableEnvVars...)
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I've just played with append a little and it's a little funny. [1]: "If it has sufficient capacity, the destination is resliced to accommodate the new elements. If it does not, a new underlying array will be allocated."
[1] https://pkg.go.dev/builtin#append
So in this case "append(ompiEnvVars, nvidiaDisableEnvVars...)" will work the same as "append(append(...)", but if you allocate additional memory for the ompiEnvVars slice, it will have a different effect. Examples:
a := []int{1} b := append(a[:1], []int{3}...) a[0] = 2 b[0] = 3 fmt.Printf("%d %d\n", a[0], b[0])
<- this prints "2 3"
a := []int{1, 2} b := append(a[:1], []int{3}...) a[0] = 2 b[0] = 3 fmt.Printf("%d %d\n", a[0], b[0])
<- this prints "3 3"
So append(append([]corev1.EnvVar(nil), ompiEnvVars...), nvidiaDisableEnvVars...) would be better here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that will be more predictable. I decided to add a helper method.
fb30471
to
89ae0e1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a note in the description on what I did to test the change.
@@ -74,21 +74,19 @@ const ( | |||
sshAuthVolume = "ssh-auth" | |||
sshAuthMountPath = "/mnt/ssh" | |||
sshHomeVolume = "ssh-home" | |||
// TODO(alculquicondor): Make home directory configurable through the API. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about that. But someone creating a container image might want not to use $HOME/.ssh
but a completely different location. And, at the moment, I can't think of other uses of the home folder from the controller perspective.
RestartPolicy: corev1.RestartPolicyNever, | ||
Containers: []corev1.Container{ | ||
{ | ||
Env: func() []corev1.EnvVar { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I considered it, but not sure if it will have consequences on the allocation of ompiEnvVars
another alternative would be:
append(append([]corev1.EnvVar(nil), ompiEnvVars...), nvidiaDisableEnvVars...)
WDYT?
89ae0e1
to
1d6ec25
Compare
66a5d91
to
1f59983
Compare
/hold cancel |
/assign @terrytangyuan @zw0610 @carmark |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this pr does more than allowing non-root mode. If so, I would prefer this pr comes with multiple commits. What do you think? @terrytangyuan
@@ -56,6 +56,9 @@ func SetDefaults_MPIJob(mpiJob *MPIJob) { | |||
if mpiJob.Spec.SlotsPerWorker == nil { | |||
mpiJob.Spec.SlotsPerWorker = newInt32(1) | |||
} | |||
if mpiJob.Spec.SSHAuthMountPath == "" { | |||
mpiJob.Spec.SSHAuthMountPath = "/root/.ssh" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you make the default path ("/root/.ssh"
) a constant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that makes sense for controller code, where a value might be used more than once. However, this file is all about defaults. Creating a constant would just make the reader have to jump from one line to another to see what the default for a field is.
@@ -46,6 +46,9 @@ func validateMPIJobSpec(spec *kubeflow.MPIJobSpec, path *field.Path) field.Error | |||
} else if !validCleanPolicies.Has(string(*spec.CleanPodPolicy)) { | |||
errs = append(errs, field.NotSupported(path.Child("cleanPodPolicy"), *spec.CleanPodPolicy, validCleanPolicies.List())) | |||
} | |||
if spec.SSHAuthMountPath == "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this validation contradict with the defaulting here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to keep validation independent from defaulting. But yes, if you put them together, this line of code would not be reachable.
} | ||
podSpec.InitContainers = append(podSpec.InitContainers, corev1.Container{ | ||
Name: "init-ssh", | ||
Image: "alpine:3.14", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we make this image configurable? some cluster environment is isolated from docker registry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Name: sshHomeVolume, | ||
MountPath: sshHomeMountPath, | ||
MountPath: "/mnt/home-ssh", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we replace constant with hardcoded string here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a different value now (it was /root/.ssh). But sure, putting it as a constant.
Adds the spec field sshAuthMountPath for MPIJob. The init script sets the permissions and ownership based on the securityContext of the launcherPod
1f59983
to
15a6e78
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put the sample in a separate commit
@@ -56,6 +56,9 @@ func SetDefaults_MPIJob(mpiJob *MPIJob) { | |||
if mpiJob.Spec.SlotsPerWorker == nil { | |||
mpiJob.Spec.SlotsPerWorker = newInt32(1) | |||
} | |||
if mpiJob.Spec.SSHAuthMountPath == "" { | |||
mpiJob.Spec.SSHAuthMountPath = "/root/.ssh" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that makes sense for controller code, where a value might be used more than once. However, this file is all about defaults. Creating a constant would just make the reader have to jump from one line to another to see what the default for a field is.
@@ -46,6 +46,9 @@ func validateMPIJobSpec(spec *kubeflow.MPIJobSpec, path *field.Path) field.Error | |||
} else if !validCleanPolicies.Has(string(*spec.CleanPodPolicy)) { | |||
errs = append(errs, field.NotSupported(path.Child("cleanPodPolicy"), *spec.CleanPodPolicy, validCleanPolicies.List())) | |||
} | |||
if spec.SSHAuthMountPath == "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to keep validation independent from defaulting. But yes, if you put them together, this line of code would not be reachable.
Name: sshHomeVolume, | ||
MountPath: sshHomeMountPath, | ||
MountPath: "/mnt/home-ssh", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a different value now (it was /root/.ssh). But sure, putting it as a constant.
} | ||
podSpec.InitContainers = append(podSpec.InitContainers, corev1.Container{ | ||
Name: "init-ssh", | ||
Image: "alpine:3.14", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
/lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
/lgtm
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: kawych, terrytangyuan The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Part of #373
Tested: unit and integration suites as well as manual runs.