Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add enhancement for Parameter Distribution #2059

Closed
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
128 changes: 128 additions & 0 deletions docs/proposals/parameter-distribution.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
# Proposal for Parameter Distribution

<!-- toc -->
- [Summary](#summary)
- [Motivation](#motivation)
- [Design Details](#design-details)
- [Experiment API changes](#experiment-api-changes)
- [Correspondence for Katib Distributions and Framework Distributions](#correspondence-for-katib-distributions-and-framework-distributions)
- [Chocolate](#chocolate)
- [Goptuna](#goptuna)
- [Hyperopt](#hyperopt)
- [Optuna](#optuna)
- [Scikit-Optimize](#scikit-optimize)
<!-- /toc -->

## Summary
This enhancement introduces `Distribution` to tuning parameters and remove redundantly `ParameterType`.

API field in the Experiment spec determine parameter type with distribution.

## Motivation
Currently, Katib does not support determining a distribution for search space that samplers pick up parameters by users.

Katib should be able to determine it by users since
almost hyperparameter tuning algorithms (framework) can determine it by users.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please can you link the appropriate issue: #1207 to this proposal motivation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense.


## Proposal
We introduce a mechanism to determine a distribution for search space by users.
That also means we introduce a mechanism to propagate distributions to suggestion-services and
set them samplers of each framework.

## Design Details
The proposal consists of a new Experiment API field and
correspondence between Katib Distributions and Framework Distributions.

### Experiment API changes
We extend the Experiment API to introduce the new fields `Distribution` to configure the distribution for search space and
remove the redundant fields `ParameterType`.

```diff
type ParameterSpec struct {
Name string `json:"name,omitempty"`
- ParameterType ParameterType `json:"parameterType,omitempty"`
+ Distribution Distribution `json:"distribution,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great proposal @tenzen-y . One question, how do we provide backward compatibility?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@johnugeorge This is a good point.

For the time being (1~2 releases?), I think we can operate ParameterType and Distribution concurrently.
This means in the case of users determining ParameterType, suggestion-services operate as now; in the case of users determining Distribution, suggestion-services set distributions to sampler.

Also, we should add webhook validation to restrict ParameterType and Distribution so that only one of them is available. (ParameterType and Distribution are exclusive)

@andreyvelich @johnugeorge wdyt?
If you agree with this, I will add this to the Proposal.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM. Also, add deprecation tag to ParameterType

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, add deprecation tag to ParameterType

SGTM
I will add the tag to the following:

ParameterType parameter_type = 2; /// Type of the parameter.

Copy link
Member Author

@tenzen-y tenzen-y Dec 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we add only new features to v1beta2 API, deprecation labels are unnecessary since we create a separate proto definition for v1beta2 API as discussed in #2059 (comment).

FeasibleSpace FeasibleSpace `json:"feasibleSpace,omitempty"`
}

- type ParameterType string
+ type Distribution string

const (
- ParameterTypeUnknown ParameterType = "unknown"
- ParameterTypeDouble ParameterType = "double"
- ParameterTypeInt ParameterType = "int"
- ParameterTypeDiscrete ParameterType = "discrete"
- ParameterTypeCategorical ParameterType = "categorical"
+ UnknownDistribution Distribution = "unknown"
+ CategoricalDistribution Distribution = "categorical"
+ IntUniformDistribution Distribution = "intUniform"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use camel case here? Personally prefer lower case intuniform

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

+ IntLogUniformDistribution Distribution = "intLogUniform"
+ FloatUniformDistribution Distribution = "floatUniform"
+ FloatLogUniformDistribution Distribution = "floatLogUniform"
Comment on lines +59 to +62
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@johnugeorge @tenzen-y @gaocegege @anencore94 What do you think about following hyperopt model instead of int and float model (e.g. uniform, quniform, loguniform, qloguniform) ? From my point of view, it sounds more native to HP tuning and many HPs papers mention that distribution.
Also, we can change step to q and integrate base parameter for the log.
Many data scientists who do HP tuning are familiar with Hyperopt, so the API will look the same for them.

Also, Ray Tune follows the same model: https://docs.ray.io/en/latest/tune/api_docs/search_space.html, and NNI has the same APIs: https://nni.readthedocs.io/en/stable/hpo/search_space.html#quniform

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about following hyperopt model instead of int and float model (e.g. uniform, quniform, loguniform, qloguniform) ? From my point of view, it sounds more native to HP tuning and many HPs papers mention that distribution.

@andreyvelich Sounds good. I would add the corresponding tables for the old ParameterType and new Distribution using the hyperopt model to this proposal.

Also, we can change step to q and integrate base parameter for the log.

@andreyvelich Sounds good. One question, Does integrate base parameter for the log mean adding the base field to struct FeasibleSpace?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM. While I am thinking if this is a huge change to our YAML APIs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM. While I am thinking if this is a huge change to our YAML APIs.

Maybe, we need to change the API version to v1beta2.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

Is it possible to convert v1beta1 resource object to v1beta2? Will it drop some necessary info from the conversion?

I will create a correspondence table between v1beta1 and v1beta2. Maybe, we only need to create a table for the ParameterType and the FeasibleSpace.

When will the webhook be configured? Should we install it by default?

IIUC, we do not need to install manifests for conversion webhook to clusters.

ref:

And when will we deprecate v1beta1?

IMO, we need to keep maintaining v1beta1 for at least one release version. This means if we introduce v1beta2 API in katib v0.16.0, we will remove v1beta1 API in katib v0.17.0.

@gaocegege Do you know how many release versions we kept maintaining v1alpha2 after we introduced v1beta1?

Copy link
Member Author

@tenzen-y tenzen-y Dec 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I think so. But we need a detailed design for this to see if it is possible.

@andreyvelich @gaocegege Maybe, custom (implemented by user) suggestion services using v1beta1 API will not work since gRPC calls are not through conversion webhook.

<------------------------------ [Updated] ------------------------------
So, we probably need to separate CRD version changes from Distribution introduces. And then I take up only Introducing Distribution in this proposal. We can follow up on Upgrading CRD version in other issues and PRs.

- Introducing Distribution: we keep using ParameterType and introducing Distribution and Base to FeasibleSpace like the following.

#1207 (comment)

So, I would like to work in the following:

- Upgrading CRD version:

------------------------------ [Updated] ------------------------------>

  1. introduce a new field that represents the gRPC API version (v1beta1 or v1beta2) to the following of katib-config since the suggestion controller needs to use a different gRPC client for v1beta1 and v1beta2. This means we keep maintaining both v1beta1 and v1beta2 gRPC APIs (proto) for a while (only gRPC API, no maintaining v1beta1 controller). And then after we remove the v1beta1 API, remove the new field in katib-config.

// SuggestionConfig is the JSON suggestion structure in Katib config.
type SuggestionConfig struct {
Image string `json:"image"`
ImagePullPolicy corev1.PullPolicy `json:"imagePullPolicy,omitempty"`
Resource corev1.ResourceRequirements `json:"resources,omitempty"`
ServiceAccountName string `json:"serviceAccountName,omitempty"`
VolumeMountPath string `json:"volumeMountPath,omitempty"`
PersistentVolumeClaimSpec corev1.PersistentVolumeClaimSpec `json:"persistentVolumeClaimSpec,omitempty"`
PersistentVolumeSpec corev1.PersistentVolumeSpec `json:"persistentVolumeSpec,omitempty"`
PersistentVolumeLabels map[string]string `json:"persistentVolumeLabels,omitempty"`
}

<------------------------------ [Updated] ------------------------------

  1. Consolidate ParameterType and FeasibleSpace.Distribution to Distribution Remove ParameterType API and add Distribution API based on the hyperopt model like @andreyvelich mentioned at [WIP] Add enhancement for Parameter Distribution #2059 (comment).

------------------------------ [Updated] ------------------------------>

@johnugeorge @tenzen-y @gaocegege @anencore94 What do you think about following hyperopt model instead of int and float model (e.g. uniform, quniform, loguniform, qloguniform) ? From my point of view, it sounds more native to HP tuning and many HPs papers mention that distribution.
Also, we can change step to q and integrate base parameter for the log.
Many data scientists who do HP tuning are familiar with Hyperopt, so the API will look the same for them.

Also, Ray Tune follows the same model: https://docs.ray.io/en/latest/tune/api_docs/search_space.html, and NNI has the same APIs: https://nni.readthedocs.io/en/stable/hpo/search_space.html#quniform

@andreyvelich @gaocegege @johnugeorge @anencore94 wdyt?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, gRPC might be a problem, yes.
Do we know how Kubernetes maintain 2 version of their gRPC APIs ?
e.g. v1 version for apps and v1beta2 version for apps ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tenzen-y Also, are we going to rename intuniform to quniform and floatuniform to uniform as I proposed ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know how Kubernetes maintain 2 version of their gRPC APIs ?

@andreyvelich Kubernetes uses helper functions to convert multiple APIs.

https://github.com/kubernetes/kubernetes/blob/c1c0e4fe0bb4e7c0145d45a010577ed64619903a/pkg/apis/apps/v1beta2/conversion.go

Does that answer your question?

Also, are we going to rename intuniform to quniform and floatuniform to uniform as I proposed ?

Yes, I updated the above comment.

)
```

### Correspondence between Katib Distributions and Framework Distributions
We extend suggestion services to be able to configure distributions for
search space using libraries provided in each framework.

#### Chocolate
TODO
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blocked by #2058


#### Goptuna
We can extend Goptuna Suggestion Service using Goptuna libraries shown in the below correspondence table for
Katib Distributions and Goptuna Distributions.

ref: https://github.com/c-bata/goptuna/blob/2245ddd9e8d1edba750839893c8a618f852bc1cf/distribution.go

| Katib Distribution | Goptuna Distribution |
|-----------------------------|------------------------------------------------------|
| CategoricalDistribution | CategoricalDistribution |
| IntUniformDistribution | IntUniformDistribution or StepIntUniformDistribution |
| IntLogUniformDistribution | IntUniformDistribution or StepIntUniformDistribution |
| FloatUniformDistribution | UniformDistribution |
| FloatLogUniformDistribution | LogUniformDistribution |


#### Hyperopt
We can extend Hyperopt Suggestion Service using Hyperopt libraries shown in the below correspondence table for
Katib Distributions and Hyperopt Distributions.

ref: http://hyperopt.github.io/hyperopt/getting-started/search_spaces/#parameter-expressions

| Katib Distribution | Hyperopt Distribution |
|-----------------------------|-----------------------|
| CategoricalDistribution | hp.choice |
| IntUniformDistribution | hp.quniform |
| IntLogUniformDistribution | hp.qloguniform |
| FloatUniformDistribution | hp.quniform |
| FloatLogUniformDistribution | hp.qloguniform |

#### Optuna
We can extend Optuna Suggestion Service using Optuna libraries shown in the below correspondence table for
Katib Distributions and Optuna Distributions.

ref: https://optuna.readthedocs.io/en/stable/reference/distributions.html

| Katib Distribution | Optuna Distribution |
|-----------------------------|---------------------------------------|
| CategoricalDistribution | distributions.CategoricalDistribution |
| IntUniformDistribution | distributions.IntDistribution |
| IntLogUniformDistribution | distributions.IntDistribution |
| FloatUniformDistribution | distributions.FloatDistribution |
| FloatLogUniformDistribution | distributions.FloatDistribution |

#### Scikit Optimize
We can extend Scikit-Optimize Suggestion Service using Scikit-Optimize libraries shown in the below correspondence table for
Katib Distributions and Scikit-Optimize Distributions.

ref: https://scikit-optimize.github.io/stable/modules/classes.html#module-skopt.space.space

| Katib Distribution | Scikit-Optimize Distribution |
|-----------------------------|------------------------------|
| CategoricalDistribution | space.Categorical |
| IntUniformDistribution | space.Integer |
| IntLogUniformDistribution | space.Integer |
Comment on lines +125 to +126
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's right. We need to set the prior argument in skopt. Also, we need to set the log argument in optuna.
I will add them to this enhancement proposal.

ref: https://optuna.readthedocs.io/en/stable/reference/generated/optuna.distributions.FloatDistribution.html#optuna-distributions-floatdistribution

| FloatUniformDistribution | space.Real |
| FloatLogUniformDistribution | space.Real |