Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to parse NaN value in Trial CR #1227

Closed
andreyvelich opened this issue Jun 20, 2020 · 8 comments · Fixed by #1245
Closed

Unable to parse NaN value in Trial CR #1227

andreyvelich opened this issue Jun 20, 2020 · 8 comments · Fixed by #1245

Comments

@andreyvelich
Copy link
Member

/kind bug

I tried to run Experiment with additional metric names that were not collected from the Trial pod.

Because of that, in Trial Controller we try to write math.NaN() value to observation and Trial can't be updated.

I got error here while comparing Trial statuses.
I am not sure that Kubernetes objects supports math.NaN(), since it must be valid JSON.

We should think how we can handle empty metric results.

This issue can be related to #889.

/cc @sperlingxx @gaocegege @johnugeorge

@issue-label-bot
Copy link

Issue-Label Bot is automatically applying the labels:

Label Probability
area/katib 0.83

Please mark this comment with 👍 or 👎 to give our bot feedback!
Links: app homepage, dashboard and code for this bot.

@issue-label-bot
Copy link

Issue Label Bot is not confident enough to auto-label this issue.
See dashboard for more details.

@sperlingxx
Copy link
Member

In addition, I think we should fail the trial if job is succeed without any ObservationLog, which are usually caused by incorrect metrics collecting configuration. Currently, trial status will be locked at pending under this situation.

@issue-label-bot
Copy link

Issue-Label Bot is automatically applying the labels:

Label Probability
area/front-end 0.53

Please mark this comment with 👍 or 👎 to give our bot feedback!
Links: app homepage, dashboard and code for this bot.

@andreyvelich
Copy link
Member Author

@sperlingxx As you can see here: https://github.com/kubeflow/katib/blob/master/pkg/controller.v1beta1/trial/trial_controller_util.go#L126-L139, we convert Trial to succeeded state only if any observation was reported.

My suggestion for this issue is to convert all API metrics parameters to "string".

  1. As you can see here: Support floats? kubernetes-sigs/controller-tools#245 it is recommended by Kubernetes.
  2. We can add initial state for max, min, latest as "not reported". So user can see which metrics were reported and which were not.

What do you think @sperlingxx @gaocegege @johnugeorge ?

@gaocegege
Copy link
Member

SGTM.

@johnugeorge
Copy link
Member

Sgtm

@andreyvelich
Copy link
Member Author

Will make this change.
/assign @andreyvelich

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants