Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add enhancement for Parameter Distribution #2059

Closed

Conversation

tenzen-y
Copy link
Member

@tenzen-y tenzen-y commented Dec 12, 2022

Signed-off-by: tenzen-y [email protected]

What this PR does / why we need it:
I added an enhancement proposal for Parameter Distribution as discussed in this.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
related #1207

Checklist:

  • Docs included if any changes are user facing

@google-oss-prow
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: tenzen-y

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tenzen-y tenzen-y force-pushed the add-proposal-log-uniform-scale branch from 05ffd16 to 93bdfc2 Compare December 12, 2022 03:21
@tenzen-y
Copy link
Member Author

/hold for the review

type ParameterSpec struct {
Name string `json:"name,omitempty"`
- ParameterType ParameterType `json:"parameterType,omitempty"`
+ Distribution Distribution `json:"distribution,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great proposal @tenzen-y . One question, how do we provide backward compatibility?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@johnugeorge This is a good point.

For the time being (1~2 releases?), I think we can operate ParameterType and Distribution concurrently.
This means in the case of users determining ParameterType, suggestion-services operate as now; in the case of users determining Distribution, suggestion-services set distributions to sampler.

Also, we should add webhook validation to restrict ParameterType and Distribution so that only one of them is available. (ParameterType and Distribution are exclusive)

@andreyvelich @johnugeorge wdyt?
If you agree with this, I will add this to the Proposal.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM. Also, add deprecation tag to ParameterType

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, add deprecation tag to ParameterType

SGTM
I will add the tag to the following:

ParameterType parameter_type = 2; /// Type of the parameter.

Copy link
Member Author

@tenzen-y tenzen-y Dec 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we add only new features to v1beta2 API, deprecation labels are unnecessary since we create a separate proto definition for v1beta2 API as discussed in #2059 (comment).

search space using libraries provided in each framework.

#### Chocolate
TODO
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blocked by #2058

Copy link
Member

@andreyvelich andreyvelich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for driving this @tenzen-y!
I left few comments.

Currently, Katib does not support determining a distribution for search space that samplers pick up parameters by users.

Katib should be able to determine it by users since
almost hyperparameter tuning algorithms (framework) can determine it by users.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please can you link the appropriate issue: #1207 to this proposal motivation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense.

Comment on lines +125 to +126
| IntUniformDistribution | space.Integer |
| IntLogUniformDistribution | space.Integer |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's right. We need to set the prior argument in skopt. Also, we need to set the log argument in optuna.
I will add them to this enhancement proposal.

ref: https://optuna.readthedocs.io/en/stable/reference/generated/optuna.distributions.FloatDistribution.html#optuna-distributions-floatdistribution

Comment on lines +59 to +62
+ IntUniformDistribution Distribution = "intUniform"
+ IntLogUniformDistribution Distribution = "intLogUniform"
+ FloatUniformDistribution Distribution = "floatUniform"
+ FloatLogUniformDistribution Distribution = "floatLogUniform"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@johnugeorge @tenzen-y @gaocegege @anencore94 What do you think about following hyperopt model instead of int and float model (e.g. uniform, quniform, loguniform, qloguniform) ? From my point of view, it sounds more native to HP tuning and many HPs papers mention that distribution.
Also, we can change step to q and integrate base parameter for the log.
Many data scientists who do HP tuning are familiar with Hyperopt, so the API will look the same for them.

Also, Ray Tune follows the same model: https://docs.ray.io/en/latest/tune/api_docs/search_space.html, and NNI has the same APIs: https://nni.readthedocs.io/en/stable/hpo/search_space.html#quniform

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about following hyperopt model instead of int and float model (e.g. uniform, quniform, loguniform, qloguniform) ? From my point of view, it sounds more native to HP tuning and many HPs papers mention that distribution.

@andreyvelich Sounds good. I would add the corresponding tables for the old ParameterType and new Distribution using the hyperopt model to this proposal.

Also, we can change step to q and integrate base parameter for the log.

@andreyvelich Sounds good. One question, Does integrate base parameter for the log mean adding the base field to struct FeasibleSpace?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM. While I am thinking if this is a huge change to our YAML APIs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM. While I am thinking if this is a huge change to our YAML APIs.

Maybe, we need to change the API version to v1beta2.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

Is it possible to convert v1beta1 resource object to v1beta2? Will it drop some necessary info from the conversion?

I will create a correspondence table between v1beta1 and v1beta2. Maybe, we only need to create a table for the ParameterType and the FeasibleSpace.

When will the webhook be configured? Should we install it by default?

IIUC, we do not need to install manifests for conversion webhook to clusters.

ref:

And when will we deprecate v1beta1?

IMO, we need to keep maintaining v1beta1 for at least one release version. This means if we introduce v1beta2 API in katib v0.16.0, we will remove v1beta1 API in katib v0.17.0.

@gaocegege Do you know how many release versions we kept maintaining v1alpha2 after we introduced v1beta1?

Copy link
Member Author

@tenzen-y tenzen-y Dec 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I think so. But we need a detailed design for this to see if it is possible.

@andreyvelich @gaocegege Maybe, custom (implemented by user) suggestion services using v1beta1 API will not work since gRPC calls are not through conversion webhook.

<------------------------------ [Updated] ------------------------------
So, we probably need to separate CRD version changes from Distribution introduces. And then I take up only Introducing Distribution in this proposal. We can follow up on Upgrading CRD version in other issues and PRs.

- Introducing Distribution: we keep using ParameterType and introducing Distribution and Base to FeasibleSpace like the following.

#1207 (comment)

So, I would like to work in the following:

- Upgrading CRD version:

------------------------------ [Updated] ------------------------------>

  1. introduce a new field that represents the gRPC API version (v1beta1 or v1beta2) to the following of katib-config since the suggestion controller needs to use a different gRPC client for v1beta1 and v1beta2. This means we keep maintaining both v1beta1 and v1beta2 gRPC APIs (proto) for a while (only gRPC API, no maintaining v1beta1 controller). And then after we remove the v1beta1 API, remove the new field in katib-config.

// SuggestionConfig is the JSON suggestion structure in Katib config.
type SuggestionConfig struct {
Image string `json:"image"`
ImagePullPolicy corev1.PullPolicy `json:"imagePullPolicy,omitempty"`
Resource corev1.ResourceRequirements `json:"resources,omitempty"`
ServiceAccountName string `json:"serviceAccountName,omitempty"`
VolumeMountPath string `json:"volumeMountPath,omitempty"`
PersistentVolumeClaimSpec corev1.PersistentVolumeClaimSpec `json:"persistentVolumeClaimSpec,omitempty"`
PersistentVolumeSpec corev1.PersistentVolumeSpec `json:"persistentVolumeSpec,omitempty"`
PersistentVolumeLabels map[string]string `json:"persistentVolumeLabels,omitempty"`
}

<------------------------------ [Updated] ------------------------------

  1. Consolidate ParameterType and FeasibleSpace.Distribution to Distribution Remove ParameterType API and add Distribution API based on the hyperopt model like @andreyvelich mentioned at [WIP] Add enhancement for Parameter Distribution #2059 (comment).

------------------------------ [Updated] ------------------------------>

@johnugeorge @tenzen-y @gaocegege @anencore94 What do you think about following hyperopt model instead of int and float model (e.g. uniform, quniform, loguniform, qloguniform) ? From my point of view, it sounds more native to HP tuning and many HPs papers mention that distribution.
Also, we can change step to q and integrate base parameter for the log.
Many data scientists who do HP tuning are familiar with Hyperopt, so the API will look the same for them.

Also, Ray Tune follows the same model: https://docs.ray.io/en/latest/tune/api_docs/search_space.html, and NNI has the same APIs: https://nni.readthedocs.io/en/stable/hpo/search_space.html#quniform

@andreyvelich @gaocegege @johnugeorge @anencore94 wdyt?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, gRPC might be a problem, yes.
Do we know how Kubernetes maintain 2 version of their gRPC APIs ?
e.g. v1 version for apps and v1beta2 version for apps ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tenzen-y Also, are we going to rename intuniform to quniform and floatuniform to uniform as I proposed ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know how Kubernetes maintain 2 version of their gRPC APIs ?

@andreyvelich Kubernetes uses helper functions to convert multiple APIs.

https://github.com/kubernetes/kubernetes/blob/c1c0e4fe0bb4e7c0145d45a010577ed64619903a/pkg/apis/apps/v1beta2/conversion.go

Does that answer your question?

Also, are we going to rename intuniform to quniform and floatuniform to uniform as I proposed ?

Yes, I updated the above comment.

- ParameterTypeCategorical ParameterType = "categorical"
+ UnknownDistribution Distribution = "unknown"
+ CategoricalDistribution Distribution = "categorical"
+ IntUniformDistribution Distribution = "intUniform"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use camel case here? Personally prefer lower case intuniform

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

Comment on lines +59 to +62
+ IntUniformDistribution Distribution = "intUniform"
+ IntLogUniformDistribution Distribution = "intLogUniform"
+ FloatUniformDistribution Distribution = "floatUniform"
+ FloatLogUniformDistribution Distribution = "floatLogUniform"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM. While I am thinking if this is a huge change to our YAML APIs.

@tenzen-y tenzen-y changed the title Add enhancement for Parameter Distribution [WIP] Add enhancement for Parameter Distribution Jan 7, 2023
@tenzen-y
Copy link
Member Author

tenzen-y commented Jan 7, 2023

I will work on this proposal after the kubeflow 1.7 feature freeze date.

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@tenzen-y
Copy link
Member Author

/remove-lifecycle stale

Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@andreyvelich
Copy link
Member

/lifecycle frozen

Copy link

@andreyvelich: The lifecycle/frozen label cannot be applied to Pull Requests.

In response to this:

/lifecycle frozen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@tenzen-y
Copy link
Member Author

/remove-lifecycle stale

@PeterWrighten
Copy link

PeterWrighten commented Mar 8, 2024

Hi, I'm interested in this project and Project5 related to GSoC 2024, and seeking for some docs or proposals for more details. If you can refer more details, it would help me a lot. Thanks! @tenzen-y @andreyvelich

Copy link

github-actions bot commented Jun 7, 2024

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@tenzen-y
Copy link
Member Author

tenzen-y commented Jun 7, 2024

/remove-lifecycle stale

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants