-
Notifications
You must be signed in to change notification settings - Fork 716
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use ksonnet to easily define TFJobs to be run as tests #374
Conversation
* This will be used as a replacement for using helm. * The ksonnet template is used to run a K8s job which runs the E2E test; ksonnet makes it easy to parameterize the test (e.g. namespace, image). * This will be used to add an E2E test to our ksonnet repository to actually verify we can successfully submit jobs. kubeflow/kubeflow#207
Not sure this PR is what we want.
@gaocegege this is ready for review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally LGTM! Only some nits.
RUN mkdir -p /opt/kubeflow/samples | ||
|
||
COPY tf_smoke.py /opt/kubeflow/samples/ | ||
RUN chmod a+x /opt/kubeflow/samples/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should keep the operator image clean and small, but it is not in high priority, the code works for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point. I think when our release infra is more mature and its easier to build multiple images that will make sense.
Practically speaking the sample added in this PR takes up no space. The biggest space is probably nodejs for the UI vs. operator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM, I will file an issue and we just keep it in the PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I am thinking if we should separate the UI and the operator. Maybe we should place the UI in an independent repo.
{ | ||
"server": "https://35.229.18.238", | ||
"namespace": "test" | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a new line here 😄
@@ -0,0 +1,4 @@ | |||
{ | |||
"server": "https://35.229.18.238", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the ip here? I think we should avoid hard coded address.
py/test_runner.py
Outdated
|
||
name = None | ||
namespace = None | ||
for pair in args.params.split(","): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering why we do not set two arguments namespace
and name
🤔
Review status: 0 of 24 files reviewed at latest revision, 4 unresolved discussions. py/test_runner.py, line 39 at r3 (raw file): Previously, gaocegege (Ce Gao) wrote…
It seemed better to treat name and namespace consistently with all other parameters and not give them special treatement. test/workflows/environments/jlewi-test/spec.json, line 2 at r3 (raw file): Previously, gaocegege (Ce Gao) wrote…
Its the K8s master associated with this environment. test/workflows/environments/jlewi-test/spec.json, line 4 at r3 (raw file): Previously, gaocegege (Ce Gao) wrote…
Done. Comments from Reviewable |
@gaocegege PTAL. This is one of the PRs I'd like to be submitted before we move the repository to kubeflow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally LGTM but the test is time-out: https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/pr-logs/pull/tensorflow_k8s/374/tf-k8s-presubmit/517/
RUN mkdir -p /opt/kubeflow/samples | ||
|
||
COPY tf_smoke.py /opt/kubeflow/samples/ | ||
RUN chmod a+x /opt/kubeflow/samples/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM, I will file an issue and we just keep it in the PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
All tests passed. |
We want to easily run TFJobs corresponding to different TFJob specs (See Improve our test harness to make it easy to write lots of E2E tests #373).
We currently do this using Jinja2 templates and then having a python script test_runner.py to run those templates.
This PR migrates to using ksonnet to define those templates.
This PR is prework to running some TFJob E2E tests as part of ksonnet tests (E2E tests need to verify that we can submit a TFJob kubeflow#207)
Cleanup
This change is