-
Notifications
You must be signed in to change notification settings - Fork 710
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pkg: Add recorder support #312
Conversation
Hi @gaocegege. Thanks for your PR. I'm waiting for a kubernetes or tensorflow member to verify that this patch is reasonable to test. If it is, they should reply with I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally LGTM with some nits.
pkg/trainer/replicas.go
Outdated
return err | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we have returned in the previous if
statement, so else
condition can be removed here.
} | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a subbranch https://github.com/tensorflow/k8s/pull/312/files/47249c0893c5be0c9e86f3f77ebbb0ea66c1726a#diff-d2bc8c1807fa25d2b911d2d781f48a07L178.
And if we remove else, there will be some idempotent events
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean the else
at line 195, not line 191.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I know. But if we remove the else, the branch that the service already exists will publish events, too.
} | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto.
pkg/trainer/replicas.go
Outdated
if err != nil { | ||
log.Errorf("Error creating PS ConfigMap: %v, %v", cm.ObjectMeta.Name, err) | ||
log.Errorf("Error creating PS ConfigMap: %v, %v", createdCM.ObjectMeta.Name, err) | ||
s.recorder.Eventf(s.Job.job, v1.EventTypeWarning, FailedCreateReason, "Error creating: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can add more info here: Error creating configmaps: xxx
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM
|
||
// If the job already exists do nothing. | ||
if err != nil { | ||
if k8s_errors.IsAlreadyExists(err) { | ||
log.Infof("Service %v already exists.", s.jobName(index)) | ||
} else { | ||
return k8sErrors.NewAggregate([]error{fmt.Errorf("Creating service %v returned error.", service.ObjectMeta.Name), err}) | ||
s.recorder.Eventf(s.Job.job, v1.EventTypeWarning, FailedCreateReason, "Error creating: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Error creating service xxx
|
||
// If the job already exists do nothing. | ||
if err != nil { | ||
if k8s_errors.IsAlreadyExists(err) { | ||
log.Infof("%v already exists.", s.jobName(index)) | ||
|
||
} else { | ||
return k8sErrors.NewAggregate([]error{fmt.Errorf("Creating Job %v returned error.", newJ.ObjectMeta.Name), err}) | ||
s.recorder.Eventf(s.Job.job, v1.EventTypeWarning, FailedCreateReason, "Error creating: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Error creating job: xxx
Can you update the PR description to explain what a recorder does? I suspect it records events and publishes them? |
/ok-to-test |
The recorder records events for the TFJob instance. For example, if we create a TFJob, we need to create some services and jobs for the TFJob. Then we record the creations for these via the recorder. |
@gaocegege Can you open up an issue to add E2E tests to verify the events are published? Please sync but otherwise LGTM. |
@jlewi OK, and I can take the issue. |
@gaocegege Need rebase :) |
We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for the commit author(s). If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google. |
Signed-off-by: Ce Gao <[email protected]>
CLAs look good, thanks! |
PTAL |
@ScorpioCPH Have your comments been addressed? |
@jlewi @gaocegege Mostly, only some nits, LGTM. |
This PR focus on recorder support in Kubernetes. Now we could push events to Kubernetes about the TFJob:
Signed-off-by: Ce Gao [email protected]
This change is