Skip to content

Latest commit

 

History

History
69 lines (52 loc) · 3.21 KB

ROADMAP.md

File metadata and controls

69 lines (52 loc) · 3.21 KB

Katib 2019 Roadmap

This document provides a high level view of where Katib will grow in 2019. These objectives are based on Katib's Critical User Journey (CUJ), which can be found here.

The original Katib design document can be found here.

Katib 1.0 Readiness

  • Stabilize APIs for StudyJobs
    • Beta by end of Q2, 1.0 by end of Q4
    • Formalize naming conventions (we use different names like katib vs vizier in different places)
    • Refactor studyjob field names #351
    • Rename fields so their names are more meaningful (e.g. requestCount vs requestNumber) #161
  • Fully integrate katib with existing E2E examples:
    • Xgboost
    • Mnist
    • GitHub issue summarization
  • Publish API documentation, best practices, tutorials
  • Issues list
  • Issues for 0.5.0 release

Enhance HP Tuning Experience

The objectives here are organized around the three stages defined in the CUJ:

1. Defining Model and Parameters

Integration with KF distributed training components

  • TFJob
  • PyTorch
  • Allow Katib to support other operator types generically #341

2. Configuring a Study

  • Streamlining the StudyJob schema - providing simpler ways to write worker specs and metric collector specs.
  • Expose more information in StudyJob status fields
    • List all job conditions with details #344
    • Returning study metadata such as number of trials and best hyperparameter values so far #356
  • Integration with Jupyter notebooks and Fairing #355
    • Allow users to start with an existing model from a notebook and do HP tuning with minimal code changes
  • Allowing a StudyJob to be resumed with additional trials #346
  • Generating StudyJob configurations and launching StudyJobs through UI
  • Supporting additional suggestion algorithms #15
  • Support for StudyJob deployment in a different namespace #343

3. Tracking Model Performance

  • Enhance metrics collection
    • May need to revisit the design - use a push model instead of pull model?
  • UI enhancements: allowing data scientists to visualize results easier
  • Support for persistent model and metadata storage
    • Ideally users should be able to export and reuse trained models from a common storage

Other Features

Designs are pending for the following new features:

Test and Release Infrastructure

  • Improve e2e test coverage
  • Improve test harness
  • Enhance release process; adding automation (see https://bit.ly/2F7o4gM)