Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Kubeflow metadata for metrics collection #862

Open
jlewi opened this issue Oct 8, 2019 · 6 comments
Open

Use Kubeflow metadata for metrics collection #862

jlewi opened this issue Oct 8, 2019 · 6 comments

Comments

@jlewi
Copy link
Contributor

jlewi commented Oct 8, 2019

/kind feature

Describe the solution you'd like
Right now Katib depends on logging the metrics to stdout (see #685).

It would be nice if instead Katib could be configured to use Kubeflow metadata to obtain the metrics.

Here's a strawman for how this might work

  1. User adds logging statement to their code to log metrics to metadata with an appropriate set of labels (e.g. experiment & trial)
  2. Katib use a selector to match trials to metrics in metadata

It seems natural for folks to instrument their code to log metrics to metadata.

Furthermore, using the metadata SDK to log metrics should mean logging metrics to metadata is no more difficult then logging to stdout.

A side benefit would be that this avoids some of the sideffects of using side cars to fetch logs from stdout (#685)

  • Sidecars make it more difficult to determine when a job is completed.
  • Logging to metadata its easier to write robust code to ensure that metrics are logged
    • Training code gets an ACK from the metadata store and can retry in the event of failure
    • In contrast if we rely on training code printing to stdout and being collected asynchronously the training code has no way of knowing whether metrics have been successfully preserved.

/cc @zhenghuiwang @johnugeorge @gaocegege

@hougangliu
Copy link
Member

hougangliu commented Oct 9, 2019

@jlewi @zhenghuiwang
In fact, all metrics have been persisted into Katib DB (now we only implement mysql driver). and we can implement a new DB driver for kubeflow metadata, just like mysql counterpart.

@jlewi
Copy link
Contributor Author

jlewi commented Oct 9, 2019

Out of the box integration with metadata would be awesome.

@gaocegege
Copy link
Member

Not sure the requirements of metadata. Now we only use katib-db to store metrics. If metadata does not require any other abstraction, I think it should be easy to support it.

@johnugeorge
Copy link
Member

Related: #841 (comment)

@stale
Copy link

stale bot commented Nov 25, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@andreyvelich
Copy link
Member

/lifecycle frozen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants