-
Notifications
You must be signed in to change notification settings - Fork 710
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to create TF Jobs from the user side? #67
Comments
@MarkusTeufelberger currently yes this is the only way. You need to build the container on your end and grab the |
Config via the environment is fine, it is just more or less just documented in code right now. Also in both cases (arguments/environment), some custom logic besides the actual machine learning code is required to make sure the script knows which role it should assume (the boilerplate code in main() in https://github.com/jlewi/mlkube.io/blob/master/examples/tf_sample/tf_sample/tf_smoke.py) Ideally, it would look like the initial example on https://www.tensorflow.org/deploy/distributed, just with the k8s cluster as target instead of a server on localhost. Something similar to:
Right now it seems to be more like:
|
Thanks for taking the time to try it out and provide feedback. For staging your code building docker containers is one approach and common in K8s. Another approach would be to use a shared filesystem (like NFS). You could then mount your code into the job via volume mounts. If the filesystem is also mountable on your dev box then you can edit code and make it available to your jobs without building docker images. In this case you could just configure your job to do something like
You can find more information about NFS and K8s here. How you setup NFS will depend on your cluster environment. You only need to parse
You could do this today using K8s but TfJob isn't designed for this case; please consider filing a feature request if you think it would be useful to support this better. Here's how you could do it today.
|
@jlewi It hasn't yet... Until then, you might find success using @mattmoor's rules_k8s https://github.com/bazelbuild/rules_k8s + https://github.com/bazelbuild/rules_jsonnet, which supports a similar-ish workflow if you want to try to get something up and running now. |
@jlewi I'm happy to send you pointers or give you a demo. The |
@mgyucht @mattmoor Thanks for the pointers. @MarkusTeufelberger I'm going to close this issue. Feel free to reopen if your question hasn't been addressed. |
I am wondering how the actual procedure would be to go from a simple
hello world
example to actually deploying it on a mlcube-enabled k8s cluster.Your example seems to include a Docker container and setting the script (which also has to contain some rather specific scaffolding, like reading config from the environment) as
ENTRYPOINT
. Is that the recommended way? Or even the only way?Ideally, I'd like to just map in a file containing the
run()
function via a volume and avoid forcing everyone to include the scaffolding or something like that. Maybe I'm missing something though.The text was updated successfully, but these errors were encountered: