-
Notifications
You must be signed in to change notification settings - Fork 710
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CRD] Request for input and output dirs in TFJobSpec #224
Comments
How would these values be passed to the user's code? What's the advantage of this proposal over just letting the user specify command line arguments or environment variables as they see fit? Why not let users rely on their tool of choice (e.g. helm or ksonnet) to specify the TfJob spec conveniently? |
@jlewi Hi Jeremy, I think it's a trade-off between flexibility and complexity. We can divide users into two kinds according to the familiarity with TensorFlow. One is the TensorFlow expert who always know how to train, visualize and serve models. Another is the TensorFlow newbie who is in the process of learning TensorFlow API.
|
Suppose I don't understand TensorFlow very well and I just copied someone else's TensorFlow example. If I look at the proposed TfJob spec then I'd probably expect if I just specify LogDir, DataDir, and ModelDir that this would set the corresponding values for my TF program. But this would most likely not work. First I'd need to understand how TfJob passes those values into the TensorFlow program for example which command line arguments it is using. I'd then need to figure out which command line arguments my program expects. Only if TfJob was using the same command line arguments as my TF program would this work. So a user has to understand how information is passed into their TF program (e.g. which command line arguments it uses) regardless. So its not clear to me that having TfJob give special treatment to certain arguments really helps. It seems like this just adds another layer that users would have to understand. |
@jlewi Thanks for your replies. Since TensorFlow is just a ML lib instead of framework such as Hadoop/Spark, we have to trade-off the user flexibility and system completeness. Of course we can open all interfaces and configs to users. In this way, we just provide a tool to launch TensorFlow training jobs and hard to provide a |
@DjangoPeng I agree there's room for providing simpler APIs and tools. My preference though would be to build this as a separate layer on top of the core layers. For example, if you want a simpler and more constrained API then TfJobSpec then why not define a new CRD with the desired API and implemented as a lambda using kube-metacontroller that just converts the spec to a TfJobSpec. The reason I like this layering approach is because the higher up the stack you go the more variability we will have. For example, we might have multiple lambdas corresponding to different TF frameworks (such as TF Estimator API). If there's a clear separation of the higher level tooling and the underlying core APIs then its easier to experiment with a variety of different higher level APIs without affecting the core components. If we start adding fields to the TfJob spec to make certain use cases easier, I think we are more likely to wind up in a place where the TfJob API isn't clean and this will complicate the implementation since will have to add logic to handle all the different cases. |
@gaocegege @DjangoPeng What do you think? Can we close this or do you think we still need these fields? |
I think we could close the issue. |
There is one input dir:
DataDir
and three output dirsModelDir
,LogDir
, andExportDir
in a typical training job.DataDir
is the path to dataset,ModelDir
is the path to checkpoint, graph and etc.LogDir
is the path to the log, andExportDir
is the path to the saved model.These dirs are often specified by users, then could we add them as an optional config in TFJobSpec?
For example,
/cc @DjangoPeng
label/feature
The text was updated successfully, but these errors were encountered: