Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update katib manifests #1903

Closed
texasmichelle opened this issue Nov 4, 2018 · 9 comments · Fixed by #1904
Closed

Update katib manifests #1903

texasmichelle opened this issue Nov 4, 2018 · 9 comments · Fixed by #1904

Comments

@texasmichelle
Copy link
Member

Following the public docs, I ran into the error reported here, which was patched in kubeflow/katib.

Related changes need to be made in this repo. What is the logic behind storing manifests in both places? How have we kept them in sync in the past and which reflects the source of truth?

@texasmichelle
Copy link
Member Author

After applying the patch, I'm still not seeing any metrics in the UI. @gaocegege @YujiOshima Can you help identify the problem? This is from the v0.3-branch.

screen shot 2018-11-04 at 10 32 57 am

@texasmichelle
Copy link
Member Author

Definitions for workerConfigMap in kubeflow/kubeflow & kubeflow/katib are also out of sync.

These discrepancies are preventing me from getting any of the examples to run.

@texasmichelle
Copy link
Member Author

Also, I'm seeing some objects created in the kubeflow namespace and others in katib - is that intended behavior for our default install?

@richardsliu
Copy link
Contributor

@YujiOshima Can you look into this? Thanks.

@richardsliu
Copy link
Contributor

Looks like we have CRD duplication across other components as well, for example tf-operator.

I think we need to add an e2etest for katib in kubeflow/kubeflow, since we are installing katib by default. The process should be:

  1. Make your changes in kubeflow/katib;
  2. Push Docker images (automatically done after post submits);
  3. Upgrade the manifest on kubeflow/kubeflow:
    • Update the image tag on katib componets
    • If needed, also update the libsonnet files
    • Make sure the e2e test passes.

That way we can at least make sure that the main repo components work out of the box.

@richardsliu
Copy link
Contributor

Also for the examples, we should change all the namespaces to kubeflow instead of katib, since the former is used in all the public documentations.

@YujiOshima
Copy link
Contributor

In v0.3-branch, the image of studyjob-controller is katib/studyjob-controller.
https://github.com/kubeflow/kubeflow/blob/v0.3-branch/kubeflow/katib/prototypes/all.jsonnet#L15

This is the latest code version of Katib.
It occurs mismatch API between other components since others using gcr.io/kubeflow-images-public/katib/ ... :v0.1.2-alpha-45-g3dce496 version.

Please note the {{.StudyId}}, {{TrialId}}, and {{.WorkerId}} should not be capitalize in this version.

@texasmichelle
Copy link
Member Author

Is the fix for v0.3-branch to revert the use of the untagged image to katib/studyjob-controller:v0.1.2-alpha-45-g3dce496 or to use a newly generated tag consistently for every image in all.jsonnet?

@texasmichelle
Copy link
Member Author

texasmichelle commented Nov 14, 2018

Created katib-245 for changing katib namespace -> kubeflow in all manifests, not just examples.

Created website-293 for the corresponding docs update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants