Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignore 404 when deleting defunct K8S pod #9194

Conversation

tomachristian
Copy link
Contributor

@tomachristian tomachristian commented Oct 20, 2024

Microsoft Reviewers: Open in CodeFlow

We started getting these exceptions after we enabled DeleteDefunctSiloPods:

Operation returned an invalid status code 'NotFound', response body {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pods \"xxxxx-fc9dd7f96-gvvt2\" not found","reason":"NotFound","details":{"name":"xxxxx-fc9dd7f96-gvvt2","kind":"pods"},"code":404}

These cases are actually common, as K8S might have already killed/deleted the pod, so when we get a NotFound exception then we can ignore it and not log the exception.

Comment on lines 259 to 266
catch (Exception exception)
// Ignore NotFound errors, as the pod may have already been deleted by other means
when (exception is not HttpOperationException { Response.StatusCode: HttpStatusCode.NotFound })
{
_logger.LogError(exception, "Error deleting pod {PodName} in namespace {PodNamespace} corresponding to defunct silo {SiloAddress}", change.Name, _podNamespace, change.SiloAddress);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
catch (Exception exception)
// Ignore NotFound errors, as the pod may have already been deleted by other means
when (exception is not HttpOperationException { Response.StatusCode: HttpStatusCode.NotFound })
{
_logger.LogError(exception, "Error deleting pod {PodName} in namespace {PodNamespace} corresponding to defunct silo {SiloAddress}", change.Name, _podNamespace, change.SiloAddress);
}
catch (Exception exception)
{
// Ignore NotFound errors, as the pod may have already been deleted by other means
if (exception is not HttpOperationException { Response.StatusCode: HttpStatusCode.NotFound })
{
_logger.LogError(exception, "Error deleting pod {PodName} in namespace {PodNamespace} corresponding to defunct silo {SiloAddress}", change.Name, _podNamespace, change.SiloAddress);
}
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, we could add a second catch clause to ignore those errors. I prefer catch + if, though.

@tomachristian tomachristian force-pushed the k8s/no_exception_when_defunct_pod_not_found branch from 7697e6a to 9ddb89e Compare October 20, 2024 15:12
@tomachristian
Copy link
Contributor Author

tomachristian commented Oct 20, 2024

The tests failed, but it looks like it was an unstable/transient one... maybe a re-run would "fix" it?

@ReubenBond
Copy link
Member

@tomachristian you're correct - that test relies a bit on luck. We need to update it to make its own luck (steer grain placement rather than hope + retry until it happens the way it wants)

@ReubenBond
Copy link
Member

ReubenBond commented Oct 20, 2024

The subsequent failure was due to an issue in the build pipeline which I need to fix - it wasn't allowed to overwrite a build artifact. We should include the attempt number in the artifact name.

@ReubenBond ReubenBond merged commit 474ea78 into dotnet:main Oct 20, 2024
20 of 22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants