Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String type for metric values #1245

Merged

Conversation

andreyvelich
Copy link
Member

Fixes: #1227.

I changed metric value from float64 to string.
If metrics can't be reported, unavailable value will be recorded.

I added condition that metric.Latest can't be unavailable here: https://github.com/andreyvelich/katib/blob/cd6b37b3614cd226781c0feac1dcf6d568d07f01/pkg/controller.v1beta1/trial/trial_controller_util.go#L133-L135.
If Objective metric values were not reported, Trial will be in False status and Succeeded condition with this message:
Warning MetricsUnavailable 2m11s (x2 over 2m11s) trial-controller Metrics are not available for Job.

Experiment will be still running and waiting until at least one Objective metric value will be reported.

I think it is correct workflow, since user can see that Objective metric is not printed by training container, modify the training container image and manually restart the Trial, by deleting appropriate BatchJob, TFJob, etc.. .

What do you think @sperlingxx @gaocegege @johnugeorge ?

/assign @gaocegege @johnugeorge @sperlingxx

@kubeflow-bot
Copy link

This change is Reviewable

Copy link
Member

@gaocegege gaocegege left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@sperlingxx
Copy link
Member

This PR sounds good to me, except I still fill not good about pending trial if its job succeed but metrics unavailable. Considering no warning messages will be shown on user interface, when this condition happens.
/lgtm

@andreyvelich
Copy link
Member Author

This PR sounds good to me, except I still fill not good about pending trial if its job succeed but metrics unavailable. Considering no warning messages will be shown on user interface, when this condition happens.
/lgtm

Maybe we could notify user about this case in the Experiment monitor page in the Katib UI?
What do you think @gaocegege @johnugeorge @sperlingxx ?

@gaocegege
Copy link
Member

SGTM. It can be in another PR.

Copy link
Member Author

@andreyvelich andreyvelich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andreyvelich

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit 29b797c into kubeflow:master Jun 30, 2020
sperlingxx pushed a commit to sperlingxx/katib that referenced this pull request Jul 9, 2020
* Change metric type from float to string

* Check unavailable latest objective metric in isTrialObservationAvailable

* Fix e2e test

* Delete consts.UnavailableMetricValue from e2e
@andreyvelich andreyvelich deleted the issue-1227-string-metrics-api branch October 6, 2021 00:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unable to parse NaN value in Trial CR
6 participants