-
Notifications
You must be signed in to change notification settings - Fork 716
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
#71 Simplify accelerators config #90
Conversation
Thanks for your pull request. It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please visit https://cla.developers.google.com/ to sign. Once you've signed, please reply here (e.g.
|
Hi @wbuchwalter. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
I signed it! |
CLAs look good, thanks! |
/ok-to-test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks
README.md
Outdated
|
||
1. Deploy the operator | ||
|
||
For non-RBAC enabled clusters: | ||
``` | ||
CHART=https://storage.googleapis.com/tf-on-k8s-dogfood-releases/latest/tf-job-operator-chart-latest.tgz | ||
helm install ${CHART} -n tf-job --wait --replace | ||
helm install ${CHART} -n tf-job --wait --replace --set cloud=<gce or azure> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gce->gke
README.md
Outdated
``` | ||
|
||
For RBAC-enabled clusters: | ||
``` | ||
CHART=https://storage.googleapis.com/tf-on-k8s-dogfood-releases/latest/tf-job-operator-chart-latest.tgz | ||
helm install ${CHART} -n tf-job --wait --replace --set rbac.install=true | ||
helm install ${CHART} -n tf-job --wait --replace --set rbac.install=true cloud=<gce or azure> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gce->gke
README.md
Outdated
|
||
The TfJob controller can be configured with a list of volumes that should be mounted from the host into the container | ||
to make GPUs work. Here's an example [ControllerConfig](https://github.com/tensorflow/k8s/blob/master/pkg/spec/controller.go): | ||
For **GCE** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GKE
README.md
Outdated
to make GPUs work. Here's an example [ControllerConfig](https://github.com/tensorflow/k8s/blob/master/pkg/spec/controller.go): | ||
For **GCE** | ||
``` | ||
helm install ${CHART} -n tf-job --wait --replace --set cloud=gce |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gce -> gke
README.md
Outdated
``` | ||
|
||
If the cluster is not hosted on GCE or Azure, you will need specify a custom configuration. | ||
To do so edit `${CHART}\custom-config.yaml` with your desired settings. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think editing the chart is undesirable because it requires the user download/unpack the chart.
What if instead of passing in a file to helm, the user has to create the config map manually. e.g.
kubectl create configmap tf-job-operator-config --from-file=path/to/bar
helm install ${CHART} -n tf-job --wait --replace --set cloud=none
if cloud=none then the helm chart doesn't include the tf-job-operator-config defined in the template.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Slight difference, is that you don't need to specify any value for cloud
. As long as it it neither gke
nor azure
the chart will not create a ConfigMap
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
cloud
value that acceptsgke
orazure
and will create the correspondingConfigMap
custom-config.yaml
file at the chart's root allowing users to specify a custom config. Ifcloud
is not passed,custom-config.yaml
will be used instead.Remarks:
config-file
flag, because in order to import a file in a Helm template, this file has to be at the root of the chart. Not doing so will result in a silent error (ConfigMap will just not be created).Since this provides a bad user experience, I chose to instead direct users to modify
custom-config.yaml
that will be already present at the the root of the cart.