-
Notifications
You must be signed in to change notification settings - Fork 443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Add enhancement for Parameter Distribution #2059
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: tenzen-y The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
d240479
to
05ffd16
Compare
Signed-off-by: tenzen-y <[email protected]>
05ffd16
to
93bdfc2
Compare
/hold for the review |
type ParameterSpec struct { | ||
Name string `json:"name,omitempty"` | ||
- ParameterType ParameterType `json:"parameterType,omitempty"` | ||
+ Distribution Distribution `json:"distribution,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great proposal @tenzen-y . One question, how do we provide backward compatibility?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@johnugeorge This is a good point.
For the time being (1~2 releases?), I think we can operate ParameterType
and Distribution
concurrently.
This means in the case of users determining ParameterType
, suggestion-services operate as now; in the case of users determining Distribution
, suggestion-services set distributions to sampler.
Also, we should add webhook validation to restrict ParameterType
and Distribution
so that only one of them is available. (ParameterType
and Distribution
are exclusive)
@andreyvelich @johnugeorge wdyt?
If you agree with this, I will add this to the Proposal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM. Also, add deprecation tag to ParameterType
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, add deprecation tag to ParameterType
SGTM
I will add the tag to the following:
katib/pkg/apis/manager/v1beta1/api.proto
Line 83 in f941ec6
ParameterType parameter_type = 2; /// Type of the parameter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we add only new features to v1beta2 API, deprecation labels are unnecessary since we create a separate proto definition for v1beta2 API as discussed in #2059 (comment).
search space using libraries provided in each framework. | ||
|
||
#### Chocolate | ||
TODO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blocked by #2058
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for driving this @tenzen-y!
I left few comments.
Currently, Katib does not support determining a distribution for search space that samplers pick up parameters by users. | ||
|
||
Katib should be able to determine it by users since | ||
almost hyperparameter tuning algorithms (framework) can determine it by users. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please can you link the appropriate issue: #1207 to this proposal motivation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense.
| IntUniformDistribution | space.Integer | | ||
| IntLogUniformDistribution | space.Integer | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess, you can set uniform or loguniform in skopt using prior
API: https://scikit-optimize.github.io/stable/modules/generated/skopt.space.space.Real.html#skopt.space.space.Real
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's right. We need to set the prior argument in skopt. Also, we need to set the log argument in optuna.
I will add them to this enhancement proposal.
+ IntUniformDistribution Distribution = "intUniform" | ||
+ IntLogUniformDistribution Distribution = "intLogUniform" | ||
+ FloatUniformDistribution Distribution = "floatUniform" | ||
+ FloatLogUniformDistribution Distribution = "floatLogUniform" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@johnugeorge @tenzen-y @gaocegege @anencore94 What do you think about following hyperopt model instead of int
and float
model (e.g. uniform
, quniform
, loguniform
, qloguniform
) ? From my point of view, it sounds more native to HP tuning and many HPs papers mention that distribution.
Also, we can change step
to q
and integrate base
parameter for the log.
Many data scientists who do HP tuning are familiar with Hyperopt, so the API will look the same for them.
Also, Ray Tune follows the same model: https://docs.ray.io/en/latest/tune/api_docs/search_space.html, and NNI has the same APIs: https://nni.readthedocs.io/en/stable/hpo/search_space.html#quniform
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about following hyperopt model instead of int and float model (e.g. uniform, quniform, loguniform, qloguniform) ? From my point of view, it sounds more native to HP tuning and many HPs papers mention that distribution.
@andreyvelich Sounds good. I would add the corresponding tables for the old ParameterType and new Distribution using the hyperopt model to this proposal.
Also, we can change step to q and integrate base parameter for the log.
@andreyvelich Sounds good. One question, Does integrate base parameter for the log
mean adding the base
field to struct FeasibleSpace
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM. While I am thinking if this is a huge change to our YAML APIs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM. While I am thinking if this is a huge change to our YAML APIs.
Maybe, we need to change the API version to v1beta2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM
Is it possible to convert v1beta1 resource object to v1beta2? Will it drop some necessary info from the conversion?
I will create a correspondence table between v1beta1 and v1beta2. Maybe, we only need to create a table for the ParameterType
and the FeasibleSpace
.
When will the webhook be configured? Should we install it by default?
IIUC, we do not need to install manifests for conversion webhook to clusters.
ref:
- https://book.kubebuilder.io/multiversion-tutorial/tutorial.html
- https://github.com/kubernetes-sigs/kubebuilder/tree/master/docs/book/src/multiversion-tutorial/testdata/project
And when will we deprecate v1beta1?
IMO, we need to keep maintaining v1beta1 for at least one release version. This means if we introduce v1beta2 API in katib v0.16.0, we will remove v1beta1 API in katib v0.17.0.
@gaocegege Do you know how many release versions we kept maintaining v1alpha2 after we introduced v1beta1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I think so. But we need a detailed design for this to see if it is possible.
@andreyvelich @gaocegege Maybe, custom (implemented by user) suggestion services using v1beta1 API will not work since gRPC calls are not through conversion webhook.
<------------------------------ [Updated] ------------------------------
So, we probably need to separate CRD version changes from Distribution introduces. And then I take up only Introducing Distribution
in this proposal. We can follow up on Upgrading CRD version
in other issues and PRs.
- Introducing Distribution
: we keep using ParameterType
and introducing Distribution
and Base
to FeasibleSpace
like the following.
So, I would like to work in the following:
- Upgrading CRD version:
------------------------------ [Updated] ------------------------------>
- introduce a new field that represents the gRPC API version (v1beta1 or v1beta2) to the following of katib-config since the suggestion controller needs to use a different gRPC client for v1beta1 and v1beta2. This means we keep maintaining both v1beta1 and v1beta2 gRPC APIs (proto) for a while (only gRPC API, no maintaining v1beta1 controller). And then after we remove the v1beta1 API, remove the new field in katib-config.
katib/pkg/util/v1beta1/katibconfig/config.go
Lines 35 to 45 in db72ce1
// SuggestionConfig is the JSON suggestion structure in Katib config. | |
type SuggestionConfig struct { | |
Image string `json:"image"` | |
ImagePullPolicy corev1.PullPolicy `json:"imagePullPolicy,omitempty"` | |
Resource corev1.ResourceRequirements `json:"resources,omitempty"` | |
ServiceAccountName string `json:"serviceAccountName,omitempty"` | |
VolumeMountPath string `json:"volumeMountPath,omitempty"` | |
PersistentVolumeClaimSpec corev1.PersistentVolumeClaimSpec `json:"persistentVolumeClaimSpec,omitempty"` | |
PersistentVolumeSpec corev1.PersistentVolumeSpec `json:"persistentVolumeSpec,omitempty"` | |
PersistentVolumeLabels map[string]string `json:"persistentVolumeLabels,omitempty"` | |
} |
<------------------------------ [Updated] ------------------------------
ConsolidateRemoveParameterType
andFeasibleSpace.Distribution
toDistribution
ParameterType
API and addDistribution
API based on the hyperopt model like @andreyvelich mentioned at [WIP] Add enhancement for Parameter Distribution #2059 (comment).
------------------------------ [Updated] ------------------------------>
@johnugeorge @tenzen-y @gaocegege @anencore94 What do you think about following hyperopt model instead of int and float model (e.g. uniform, quniform, loguniform, qloguniform) ? From my point of view, it sounds more native to HP tuning and many HPs papers mention that distribution.
Also, we can change step to q and integrate base parameter for the log.
Many data scientists who do HP tuning are familiar with Hyperopt, so the API will look the same for them.Also, Ray Tune follows the same model: https://docs.ray.io/en/latest/tune/api_docs/search_space.html, and NNI has the same APIs: https://nni.readthedocs.io/en/stable/hpo/search_space.html#quniform
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, gRPC might be a problem, yes.
Do we know how Kubernetes maintain 2 version of their gRPC APIs ?
e.g. v1 version for apps and v1beta2 version for apps ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tenzen-y Also, are we going to rename intuniform
to quniform
and floatuniform
to uniform
as I proposed ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we know how Kubernetes maintain 2 version of their gRPC APIs ?
@andreyvelich Kubernetes uses helper functions to convert multiple APIs.
Does that answer your question?
Also, are we going to rename intuniform to quniform and floatuniform to uniform as I proposed ?
Yes, I updated the above comment.
- ParameterTypeCategorical ParameterType = "categorical" | ||
+ UnknownDistribution Distribution = "unknown" | ||
+ CategoricalDistribution Distribution = "categorical" | ||
+ IntUniformDistribution Distribution = "intUniform" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use camel case here? Personally prefer lower case intuniform
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM
+ IntUniformDistribution Distribution = "intUniform" | ||
+ IntLogUniformDistribution Distribution = "intLogUniform" | ||
+ FloatUniformDistribution Distribution = "floatUniform" | ||
+ FloatLogUniformDistribution Distribution = "floatLogUniform" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM. While I am thinking if this is a huge change to our YAML APIs.
I will work on this proposal after the kubeflow 1.7 feature freeze date. |
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
/remove-lifecycle stale |
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
/lifecycle frozen |
@andreyvelich: The In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
/remove-lifecycle stale |
Hi, I'm interested in this project and Project5 related to GSoC 2024, and seeking for some docs or proposals for more details. If you can refer more details, it would help me a lot. Thanks! @tenzen-y @andreyvelich |
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
/remove-lifecycle stale |
Signed-off-by: tenzen-y [email protected]
What this PR does / why we need it:
I added an enhancement proposal for Parameter Distribution as discussed in this.
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):related #1207
Checklist: