-
Notifications
You must be signed in to change notification settings - Fork 455
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding initial v1alpha2 API controller #457
Adding initial v1alpha2 API controller #457
Conversation
log.Info("Error in converting unstructured to status: %v ", err) | ||
return err | ||
} | ||
if jobStatus.Active == 0 && jobStatus.Succeeded > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think batch job supports JobCondition now: https://github.com/kubernetes/api/blob/master/batch/v1/types.go#L168
func (exp *Experiment) MarkExperimentStatusSucceeded(reason, message string) { | ||
currentCond := getCondition(exp, ExperimentRunning) | ||
if currentCond != nil { | ||
exp.setCondition(ExperimentRunning, v1.ConditionFalse, currentCond.Reason, currentCond.Message) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why mark ExperimentRunning as v1.ConditionFalse?
and why not mark ExperimentCreated as v1.ConditionFalse when MarkExperimentStatusRunning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to follow the same conventions from the job operators. But, it is true that we have to rethink on making more meaningful conditions. eg: we may not need Successful
condition and Running
condition. We could make Running
as false stating reason as Successful
. This is followed in k8s default objects
04b1cc4
to
8c48f16
Compare
/retest |
func (r *ReconcileExperiment) createTrials(instance *experimentsv1alpha2.Experiment, addCount int) error { | ||
|
||
logger := log.WithValues("Experiment", types.NamespacedName{Name: instance.GetName(), Namespace: instance.GetNamespace()}) | ||
trials, err := util.GetSuggestions(instance, addCount) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@johnugeorge
It can run more than one Suggestion at once, since we name it GetSuggestions
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please explain more?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean you have named this function GetSuggestions
. Is it make sense? I thought we should name it GetSuggestion
.
completedCount := instance.Status.TrialsSucceeded + instance.Status.TrialsFailed + instance.Status.TrialsKilled | ||
|
||
if activeCount > parallelCount { | ||
deleteCount := activeCount - parallelCount |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think line 215 and 223 can change to if activeCount >= parallelCount {} else {}
if deleteCount > 0 { | ||
//delete 'deleteCount' number of trails. Sort them? | ||
logger.Info("DeleteTrials", "deleteCount", deleteCount) | ||
err = r.deleteTrials(instance, deleteCount) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
r.deleteTrials will make Status.TrialsKilled? will it make completedCount value dirty in line 213?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel that TrialsKilled has a different meaning. We should just delete the trial here.
@richardsliu @YujiOshima @johnugeorge @gaocegege how about merging this PR first? Then we can code further based on it. |
SGTM. /lgtm |
@hougangliu @gaocegege @richardsliu @YujiOshima @johnugeorge @gaocegege |
instance.Status.StartTime = &now | ||
} | ||
msg := "Trial is created" | ||
instance.MarkTrialStatusCreated(util.TrialCreatedReason, msg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MarkTrialStatusCreated is collect? It looks you need to mark running for going next step.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trial is marked running when jobs are in running state
instance.Status.StartTime = &now | ||
} | ||
msg := "Experiment is created" | ||
instance.MarkExperimentStatusCreated(util.ExperimentCreatedReason, msg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MarkTrialStatusCreated is collect? It looks you need to mark running for going next step.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the Experiment is created, it is marked Created. It is marked running when trials are created.
@hougangliu @andreyvelich Thanks, I agree with we proceed step by step since this change will be big and complicated. |
I have verified creation of experiment and trial objects with status updates. We can merge this PR if current review is completed. |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: hougangliu The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
* Adding initial v1alpha2 API controller (kubeflow#457) * Adding initial v1alpha2 controller * Adding logs * Adding comments * Adding template functions for experiment * Adding error checks * chore: Skip test when code is not changed (kubeflow#467) Signed-off-by: Ce Gao <[email protected]> * Add serviceAccountName in UI deployment (kubeflow#469) * feat: Introduce suggestion Signed-off-by: Ce Gao <[email protected]> * owners: Add xinyi yu Signed-off-by: Ce Gao <[email protected]>
This is an initial version of level based implementation of controller for early review of code flow. I am waiting for #435 and #456 to get merged first and make necessary changes after that.
Fixes: #441
/cc @richardsliu @gaocegege @YujiOshima
This change is