Skip to content

Commit

Permalink
Fix bug with jobs not being marked as completed.
Browse files Browse the repository at this point in the history
* A bug was introduced with getting the replica status in kubeflow#344 which
switched to creating pods directly.

* Our presubmits/postsubmits were failing but this went unnoticed because
the git status check was improperly reported as succeeded.

* The bug is because we try to get the pod status by name but the name
doesn't include the random salt in the pod name.

* The code in question is a legacy of when we were using job controllers and
we first got the status of the job controller. We incorrectly changed that
code to get the pod. The correct thing is to just list pods by label; we
already do that in the code below so we just need to delete some code.

* Fix kubeflow#500
  • Loading branch information
jlewi committed Mar 23, 2018
1 parent eec56b5 commit 3d53bc5
Showing 1 changed file with 0 additions and 10 deletions.
10 changes: 0 additions & 10 deletions pkg/trainer/replicas.go
Original file line number Diff line number Diff line change
Expand Up @@ -331,16 +331,6 @@ func replicaStatusFromPodList(l v1.PodList, name string) tfv1alpha1.ReplicaState

// GetSingleReplicaStatus returns status for a single replica
func (s *TFReplicaSet) GetSingleReplicaStatus(index int32) tfv1alpha1.ReplicaState {
p, err := s.ClientSet.CoreV1().Pods(s.Job.job.ObjectMeta.Namespace).Get(s.genName(index), meta_v1.GetOptions{})

if err != nil {
return tfv1alpha1.ReplicaStateUnknown
}

if v1.PodSucceeded == p.Status.Phase {
return tfv1alpha1.ReplicaStateSucceeded
}

labels := s.LabelsByIndex(index)
selector, err := labels.ToSelector()
if err != nil {
Expand Down

0 comments on commit 3d53bc5

Please sign in to comment.