Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GSoC] Add New Parameter in tune #2369

Merged
merged 13 commits into from
Jul 18, 2024

Conversation

Electronic-Waste
Copy link
Member

@Electronic-Waste Electronic-Waste commented Jun 23, 2024

What this PR does / why we need it:

I add a new parameter metrics_collector_config to tune function. Design details: https://github.com/kubeflow/katib/blob/cc95ef03cb25df3d86a1a1c20c9c69bad17fce92/docs/proposals/push-based-metrics-collection.md#add-new-parameter-in-tune

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Checklist:

  • Docs included if any changes are user facing

@Electronic-Waste
Copy link
Member Author

ref issue: #2340

@Electronic-Waste
Copy link
Member Author

PTAL👀 @andreyvelich @johnugeorge

@Electronic-Waste
Copy link
Member Author

@andreyvelich I add my code here:

// Add Katib Trial labels to the Pod metadata.
mutatePodMetadata(mutatedPod, trial)

// Pass env variable KATIB_TRIAL_NAME to training containers using fieldPath.
for idx := range mutatedPod.Spec.Containers {
	if mutatedPod.Spec.Containers[idx].Env == nil {
		mutatedPod.Spec.Containers[idx].Env = []v1.EnvVar{}
	}
	mutatedPod.Spec.Containers[idx].Env = append(
		mutatedPod.Spec.Containers[idx].Env, 
		v1.EnvVar{
			Name: consts.EnvTrialName,
			ValueFrom: &v1.EnvVarSource{
				FieldRef: &v1.ObjectFieldSelector{
					FieldPath: fmt.Sprintf("metadata.labels['%s']", consts.LabelTrialName),
				},
			},
		},
	)
}

PTAL👀. Thank you for your review!

@Electronic-Waste
Copy link
Member Author

/area gsoc

@andreyvelich
Copy link
Member

I don't think, we should mutate KATIB_TRIAL_NAME env to all containers in the Pod.
So maybe, similar as in wrapWorkerContainer function, we should only add this environment to the primary container.

@andreyvelich
Copy link
Member

Please rebase your PR as well @Electronic-Waste

@andreyvelich
Copy link
Member

@Electronic-Waste Additionally, I think we forgot to add this line to the post-gen script of Katib SDK:

# Import Katib report metrics functions
from kubeflow.katib.api.report_metrics import report_metrics

Similar to how we include other imports to the__init__.py.
Since this file is auto-generated by code-gen, we need to update it via post-gen script.

@Electronic-Waste Please update this script, otherwise after we re-generate the SDK, your import will be deleted.

@Electronic-Waste
Copy link
Member Author

@andreyvelich Done! I've rebased the branch, modified the code to pass env variable only to the primary container and added the importing code to the post-gen script of Katib SDK.

@Electronic-Waste
Copy link
Member Author

@andreyvelich Thank you for your detailed review!

pkg/webhook/v1beta1/pod/inject_webhook.go Outdated Show resolved Hide resolved
@@ -140,15 +141,37 @@ func (s *SidecarInjector) Mutate(pod *v1.Pod, namespace string) (*v1.Pod, error)
// Add Katib Trial labels to the Pod metadata.
mutatePodMetadata(mutatedPod, trial)

// Pass env variable KATIB_TRIAL_NAME to the primary container using fieldPath.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Electronic-Waste Additionally, since we always pass the Trial name to the training container, we should not allow to pass Trial name and namespace via template generator

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it's not friendly to users. Since junior users are unfamiliar with that, they may be puzzled about why they can't pass Trial's namespace and name to the training container. It's more straightforward for them to simply define trialParameter trialName and use it in the pod env as an example suggests.

Also, it's more backward-compatible to just allow users passing Trial name and namespace via triallParameter and pod env. WDYT👀 @andreyvelich @tenzen-y @johnugeorge .

Copy link
Member

@andreyvelich andreyvelich Jul 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since junior users are unfamiliar with that, they may be puzzled about why they can't pass Trial's namespace and name to the training container.

Since we always pass KATIB_TRIAL_NAME to the Trial's Pods, users will always be able to read this environment in their training code, isn't ?
However, this can be useful for other part of the Pod spec: What if user wants to generate different volume which contains Trial name.

@Electronic-Waste Please can you an create an issue to track this discussion ? I think, we should discuss the future of Trial metadata parameters.

pkg/webhook/v1beta1/pod/inject_webhook.go Outdated Show resolved Hide resolved
@google-oss-prow google-oss-prow bot added size/L and removed size/M labels Jul 16, 2024
@Electronic-Waste
Copy link
Member Author

@andreyvelich I've wrapped the code into mutatePodEnv and added some unit tests for it. PTAL👀

Copy link
Member

@andreyvelich andreyvelich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @Electronic-Waste!
just a small comment from me.

pkg/webhook/v1beta1/pod/inject_webhook_test.go Outdated Show resolved Hide resolved
Signed-off-by: Electronic-Waste <[email protected]>
Copy link
Member

@andreyvelich andreyvelich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, we should be ready to merge it.
Thanks to your contribution @tenzen-y 🎉
/lgtm
/approve

Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andreyvelich

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-prow google-oss-prow bot merged commit a8840f2 into kubeflow:master Jul 18, 2024
60 checks passed
shashank-iitbhu pushed a commit to shashank-iitbhu/katib that referenced this pull request Jul 25, 2024
* chore: add metrics_collector_config in tune function.

Signed-off-by: Electronic-Waste <[email protected]>

* rebase: rebase feat/new-param-tune to master.

Signed-off-by: Electronic-Waste <[email protected]>

* chore: add metrics collector kind list in comment.

Signed-off-by: Electronic-Waste <[email protected]>

* fix: always pass Trial name to the training container.

Signed-off-by: Electronic-Waste <[email protected]>

* fix: delete passing env variable logics in katib_client.py

Signed-off-by: Electronic-Waste <[email protected]>

* fix: passing env variable KATIB_TRIAL_NAME in the webhook of pod.

Signed-off-by: Electronic-Waste <[email protected]>

* fix: pass env variable KATIB_TRIAL_NAME only to the primary container.

Signed-off-by: Electronic-Waste <[email protected]>

* chore: add report_metrics in post_gen.py.

Signed-off-by: Electronic-Waste <[email protected]>

* fix: change nil error to allErrs(deleted by accident).

Signed-off-by: Electronic-Waste <[email protected]>

* fix: fix lint error in inject_webhook.go.

Signed-off-by: Electronic-Waste <[email protected]>

* fix: wrap env variables passing logics into mutatePodEnv.

Signed-off-by: Electronic-Waste <[email protected]>

* chore: add unit tests for mutatePodEnv.

Signed-off-by: Electronic-Waste <[email protected]>

* fix: delete protocmp.

Signed-off-by: Electronic-Waste <[email protected]>

---------

Signed-off-by: Electronic-Waste <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants