Katib 2019 Roadmap

This document provides a high level view of where Katib will grow in 2019. These objectives are based on Katib's Critical User Journey (CUJ), which can be found here.

The original Katib design document can be found here.

Katib 1.0 Readiness

Stabilize APIs for StudyJobs
- Beta by end of Q2, 1.0 by end of Q4
- Formalize naming conventions (we use different names like katib vs vizier in different places)
- Refactor studyjob field names #351
- Rename fields so their names are more meaningful (e.g. requestCount vs requestNumber) #161
Fully integrate katib with existing E2E examples:
- Xgboost
- Mnist
- GitHub issue summarization
Publish API documentation, best practices, tutorials
Issues list
Issues for 0.5.0 release

Enhance HP Tuning Experience

The objectives here are organized around the three stages defined in the CUJ:

1. Defining Model and Parameters

Integration with KF distributed training components

TFJob
PyTorch
Allow Katib to support other operator types generically #341

2. Configuring a Study

Streamlining the StudyJob schema - providing simpler ways to write worker specs and metric collector specs.
Expose more information in StudyJob status fields
- List all job conditions with details #344
- Returning study metadata such as number of trials and best hyperparameter values so far #356
Integration with Jupyter notebooks and Fairing #355
- Allow users to start with an existing model from a notebook and do HP tuning with minimal code changes
Allowing a StudyJob to be resumed with additional trials #346
Generating StudyJob configurations and launching StudyJobs through UI
Supporting additional suggestion algorithms #15
Support for StudyJob deployment in a different namespace #343

3. Tracking Model Performance

Enhance metrics collection
- May need to revisit the design - use a push model instead of pull model?
UI enhancements: allowing data scientists to visualize results easier
Support for persistent model and metadata storage
- Ideally users should be able to export and reuse trained models from a common storage

Other Features

Designs are pending for the following new features:

Multi-Tenancy Support
NAS
Batch scheduling
Integration with Pipelines
Early stopping feature

Test and Release Infrastructure

Improve e2e test coverage
Improve test harness
Enhance release process; adding automation (see https://bit.ly/2F7o4gM)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ROADMAP.md

ROADMAP.md

Katib 2019 Roadmap

Katib 1.0 Readiness

Enhance HP Tuning Experience

1. Defining Model and Parameters

2. Configuring a Study

3. Tracking Model Performance

Other Features

Test and Release Infrastructure

Files

ROADMAP.md

Latest commit

History

ROADMAP.md

File metadata and controls

Katib 2019 Roadmap

Katib 1.0 Readiness

Enhance HP Tuning Experience

1. Defining Model and Parameters

2. Configuring a Study

3. Tracking Model Performance

Other Features

Test and Release Infrastructure