-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make data mover fail early #7052
Make data mover fail early #7052
Conversation
470ef7e
to
481dc65
Compare
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #7052 +/- ##
==========================================
+ Coverage 61.70% 61.77% +0.07%
==========================================
Files 258 259 +1
Lines 27740 27903 +163
==========================================
+ Hits 17117 17238 +121
- Misses 9420 9460 +40
- Partials 1203 1205 +2 ☔ View full report in Codecov by Sentry. |
pkg/util/kube/pod.go
Outdated
} else if pod.Status.Phase == corev1api.PodPending { | ||
// Check the conditions for Pending reason to see if it's unschedulable | ||
for _, condition := range pod.Status.Conditions { | ||
if condition.Reason == corev1api.PodReasonUnschedulable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// PodReasonUnschedulable reason in PodScheduled PodCondition means that the scheduler
// can't schedule the pod right now, for example due to insufficient resources in the cluster.
I don't think we can always fail earlier for this case, if the resources fill back later, the pod should have started successfully, but we failed it unnecessarily.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the unschedulable check first for a later more specific scenario we could add it back
481dc65
to
6d34ace
Compare
6d34ace
to
841cfd0
Compare
841cfd0
to
e35f670
Compare
In order to prevent potential misjudgments under uncertain conditions, function named |
e35f670
to
c7004f5
Compare
err := UpdateDataUploadWithRetry(context.Background(), r.client, types.NamespacedName{Namespace: du.Namespace, Name: du.Name}, r.logger.WithField("dataupload", du.Name), | ||
func(dataUpload *velerov2alpha1api.DataUpload) { | ||
dataUpload.Spec.Cancel = true | ||
dataUpload.Status.Message = fmt.Sprintf("dataupload mark as cancel to failed early for exposing pod %s/%s is in abnormal status", pod.Namespace, pod.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is better to append the reason why the pod is unrecoverable into dataUpload.Status.Message
, which is helpful for troubleshooting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
err := UpdateDataDownloadWithRetry(context.Background(), r.client, types.NamespacedName{Namespace: dd.Namespace, Name: dd.Name}, r.logger.WithField("datadownlad", dd.Name), | ||
func(dataDownload *velerov2alpha1api.DataDownload) { | ||
dataDownload.Spec.Cancel = true | ||
dataDownload.Status.Message = fmt.Sprintf("datadownload mark as cancel to failed early for exposing pod %s/%s is in abnormal status", pod.Namespace, pod.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is better to append the reason why the pod is unrecoverable into dataDownload.Status.Message
, which is helpful for troubleshooting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
c7004f5
to
e060d55
Compare
} | ||
log.Debug("Exposed pod is in abnormal status, and datadownload is marked as cancel") | ||
} else { | ||
log.Debug("Waiting for exposed pod running...") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest to remove this log, this doesn't look necessary but will be printed multiple times. We should keep the quality of the logs even for debug logs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
Signed-off-by: Ming <[email protected]>
e060d55
to
03dff10
Compare
Thank you for contributing to Velero!
Please add a summary of your change
Does your change fix a particular issue?
Fixes #(issue)
#6562
Please indicate you've done the following:
/kind changelog-not-required
as a comment on this pull request.site/content/docs/main
.