Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scope tf-operator to a namespace #789

Merged
merged 3 commits into from
Aug 22, 2018
Merged

Scope tf-operator to a namespace #789

merged 3 commits into from
Aug 22, 2018

Conversation

ankushagarwal
Copy link

@ankushagarwal ankushagarwal commented Aug 21, 2018

Based on the KUBEFLOW_NAMESPACE, tf-operator will be scoped to a
particular namespace.

If --namespace flag is not set or left as empty, tf-operator will be
scoped to cluster-wide. If it is set to a namespace value, tf-operator
will only operate in the namespace

Fixes #759

/cc @gaocegege @jlewi


This change is Reviewable

Based on the KUBEFLOW_NAMESPACE, tf-operator will be scoped to a
particular namespace.

If KUBEFLOW_NAMESPACE is not set or left as empty, tf-operator will be
scoped to cluster-wide. If it is set to a namespace value, tf-operator
will only operate in the namespace
@ankushagarwal
Copy link
Author

Once we merge this, we can create a new image and then I will update the tfjob prototype to parametrize this.

@ankushagarwal
Copy link
Author

/retest

@gaocegege
Copy link
Member

/lgtm

@gaocegege
Copy link
Member

/retest

@jlewi
Copy link
Contributor

jlewi commented Aug 21, 2018

Why an environment variable and not a command line option?

A command line option seems more discoverable to me; e.g. run the binary with --help.

And in the intended use case we aren't really inheriting namespace from the environment. I think of the namespace as something we will explicitly set as opposed to something that will be inherited e.g. the NAMESPACE of the pod.

I don't think we want the namespace the operator is handling to be the same as the namespace the operator is running in. This should allow further separation between "core kubeflow" and user space.

@k8s-ci-robot k8s-ci-robot removed the lgtm label Aug 21, 2018
@ankushagarwal
Copy link
Author

Done.

@gaocegege
Copy link
Member

/retest

@gaocegege
Copy link
Member

/lgtm

Copy link
Member

@ScorpioCPH ScorpioCPH left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM.

@@ -65,6 +65,12 @@ func Run(opt *options.ServerOption) error {
log.Infof("EnvKubeflowNamespace not set, use default namespace")
namespace = metav1.NamespaceDefault
}
if len(opt.Namespace) == 0 {
opt.Namespace = v1.NamespaceAll
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about set the default value in fs.StringVar()?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -40,6 +41,10 @@ func (s *ServerOption) AddFlags(fs *flag.FlagSet) {
`The url of the Kubernetes API server,
will overrides any value in kubeconfig, only required if out-of-cluster.`)

fs.StringVar(&s.Namespace, "namespace", "",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can set default value as v1.NamespaceAll here.

@jlewi
Copy link
Contributor

jlewi commented Aug 22, 2018

nit: please update the PR description to reflect the fact its an argument now not an environment variable.

@k8s-ci-robot k8s-ci-robot removed the lgtm label Aug 22, 2018
@ankushagarwal
Copy link
Author

Addressed comments. PTAL.

@jlewi
Copy link
Contributor

jlewi commented Aug 22, 2018

Looks like some transient SSL error failure

Traceback (most recent call last):
 File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
   "__main__", fname, loader, pkg_name)
 File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
   exec code in run_globals
 File "/mnt/test-data-volume/kubeflow-tf-operator-presubmit-v1-789-b5ed4cb-957-1c49/src/kubeflow/tf-operator/py/deploy.py", line 360, in <module>
   main()
 File "/mnt/test-data-volume/kubeflow-tf-operator-presubmit-v1-789-b5ed4cb-957-1c49/src/kubeflow/tf-operator/py/deploy.py", line 356, in main
   args.func(args)
 File "/mnt/test-data-volume/kubeflow-tf-operator-presubmit-v1-789-b5ed4cb-957-1c49/src/kubeflow/tf-operator/py/deploy.py", line 139, in setup_cluster
   util.create_cluster(gke, project, zone, cluster_request)
 File "/mnt/test-data-volume/kubeflow-tf-operator-presubmit-v1-789-b5ed4cb-957-1c49/src/kubeflow/testing/py/kubeflow/testing/util.py", line 170, in create
_cluster
   create_op = wait_for_operation(gke, project, zone, response["name"])
 File "/mnt/test-data-volume/kubeflow-tf-operator-presubmit-v1-789-b5ed4cb-957-1c49/src/kubeflow/testing/py/kubeflow/testing/util.py", line 235, in wait_f
or_operation
   projectId=project, zone=zone, operationId=op_id).execute()
 File "/usr/local/lib/python2.7/dist-packages/oauth2client/_helpers.py", line 133, in positional_wrapper
   return wrapped(*args, **kwargs)
 File "/usr/local/lib/python2.7/dist-packages/googleapiclient/http.py", line 839, in execute
   method=str(self.method), body=self.body, headers=self.headers)
 File "/usr/local/lib/python2.7/dist-packages/googleapiclient/http.py", line 179, in _retry_request
   raise exception
ssl.SSLError: ('The read operation timed out',)

/test all

@jlewi
Copy link
Contributor

jlewi commented Aug 22, 2018

/lgtm
/approve

@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jlewi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants