-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add putfiles
command and add parse
functions on controller.
#496
Conversation
} else if os.IsNotExist(pkgerr) { | ||
return fmt.Errorf("stat jobpackage '%s' error: %v", jobPackage, pkgerr) | ||
glog.Warning("jobpackage not a local dir, skip upload.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a merge blocker, but maybe we can switch to log15
gradually?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Maybe we need an individual PR to fix this.
go/controller/controller.go
Outdated
m := parser.ParseToMaster(job) | ||
|
||
b, _ := json.MarshalIndent(p, "", " ") | ||
log.Debug("create pserver:" + string(b[:])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
b
instead of b[:]
is more idiomatic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.Thanks.
go/controller/controller.go
Outdated
|
||
// TODO(gongwb): create them | ||
// just like: | ||
// namespace := job.ObjectMeta.Namespace |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe put this code in your notes instead :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.Thanks.
go/controller/jobparser.go
Outdated
// TODO: create job. | ||
return &batchv1.Job{} | ||
replicas := int32(job.Spec.Trainer.MinInstance) | ||
command := make([]string, 2, 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is same as make([]string, 2)
, which is typically used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.Thanks.
go/controller/jobparser.go
Outdated
v1.Container{ | ||
Name: "master", | ||
Image: job.Spec.Image, | ||
ImagePullPolicy: "Always", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ImagePullPolicy
好几个地方都用到了,可以定义成一个const,或者启动参数
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
go/controller/jobparser.go
Outdated
@@ -171,23 +297,42 @@ func podEnv(job *paddlejob.TrainingJob) []v1.EnvVar { | |||
v1.EnvVar{Name: "PADDLE_INIT_NUM_GRADIENT_SERVERS", Value: strconv.Itoa(job.Spec.Trainer.MinInstance)}, | |||
v1.EnvVar{Name: "PADDLE_INIT_NUM_PASSES", Value: strconv.Itoa(job.Spec.Passes)}, | |||
v1.EnvVar{Name: "PADDLE_INIT_USE_GPU", Value: needGPU}, | |||
|
|||
// FIXME(gongwb): LD_LIBRARY_PATH? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LD_LIBRARY_PATH
需要用来增加CUDA相关库的目录
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
go/controller/jobparser.go
Outdated
v1.Volume{ | ||
Name: job.ObjectMeta.Name + "-workspace", | ||
VolumeSource: v1.VolumeSource{ | ||
HostPath: &v1.HostPathVolumeSource{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
可以加个TODO,需要增加cephfs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Volume需要区分使用的文件系统:ceph, hostpath,还需要同时挂载用户目录和public目录
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM++
Fix #484