[BUG] CloneSet validating webhook requests pod with restartPolicy `onFailure` even though it is supported #794

shuyingliang · 2021-11-09T00:05:51Z

What happened:
We are deploying a workload using CloneSet with restartPolicy as onFailure.
because one of the colocating containers is not a long running service and it needs to exit after its job is done.

We configured the validating webhook cloneset ( vcloneset.kb.io) failurePolicy as Ignore.

However, the pod request was rejected given the error:

"admission webhook \"vcloneset.kb.io\" denied the request: spec.template.spec.restartPolicy: Unsupported value: \"OnFailure\": supported values: \"Always\""}

Then we somehow disabled the webhook completely (hacks such as giving the webhook a wrong cert in the field caBundle , controller-manager happily proceeds but just the webhook will not take effect), and we see our workloads deployed successfully with the expected behavior (no restarting the colocating container in the pod if exited successfully)

I noticed something in the code base specified the webhook to fail the request once the restartPolicy is not Always.

What you expected to happen:

We expect that

when the webhook failurePolicy says Ignore, then just don't fail the request when the pod restart policy is not Always.
The onFailure is expected to behave, why adding the guards such as here? Can we remove it?

How to reproduce it (as minimally and precisely as possible):
Just create the Cloneset with a pod restartPolicy as onFailure, webhook mutating webhook failurePolicy as Ignore
and you will see the pod request is rejected with the aforementioned error

Environment
We are using Kruise version: 0.9.0, but the same issue exists for other versions.
If the problem fixed, we'd appreciate the change is available >= 0.9.0

The text was updated successfully, but these errors were encountered:

FillZpp · 2021-11-09T02:27:05Z

@wgitg Actually, workloads for both stateless and stateful long-running-service, e.g. Deployment, CloneSet, StatefulSet, are all limited that failurePolicy can only be Always.
Validation for Deployment/ReplicaSet

It is to avoid continuously creating pods when pods turn into completed state, no matter expected or unexpected.

shuyingliang · 2021-11-09T06:30:21Z

It is to avoid continuously creating pods when pods turn into completed state, no matter expected or unexpected.

@FillZpp I think you meant to say that restartPolicy as Always is to restart applications when they stop, irrespective they stopped or not. :)

To circle back to the first point here, any insights on why webhook configuration failurePolicy: Ignore, still rejects the pod request with the restartPolicy as OnFailure? Isn't the validating webhook gonna ignore/allow such pod request?

FillZpp · 2021-11-09T06:38:38Z

@wgitg You can read this doc. Ignore means that an error calling the webhook is ignored and the API request is allowed to continue, such as webhook has crashed and apiserver can not connect to it, but not the expected error in the webhook response.

shuyingliang · 2021-11-09T06:42:26Z

@FillZpp the thing is that the cloneset's validating webhook rejects the request when we specified the webhook configuration failurePolicy as Ignore.

FillZpp · 2021-11-09T06:49:44Z

You may need to understand that:

Rejection from webhook is expected failure, apiserver will not ignore it.
Connection error is unexpected failure, apiserver will ignore it.

shuyingliang · 2021-11-09T07:00:12Z

Well, we are talking about the first case here, where we supposed to see apiserver should ignore the error from webhook and allow the request, even though the webhook returns errors, as specified in the doc
an error calling the webhook is ignored and the API request is allowed to continue

Is it possible that the error here

return admission.Errored(http.StatusUnprocessableEntity, allErrs.ToAggregate())

returned from validating webhook for cloneset is not ignored by apiserver and used to reject the request?

FillZpp · 2021-11-09T07:03:15Z

But the question is, if apiserver should allow the request no matter what response the webhook returns for Ignore, what's the meaning of the existence of this webhook...

shuyingliang · 2021-11-09T07:11:52Z

@FillZpp Good question. But isn't that what the doc says: an error calling the webhook is ignored and the API request is allowed to continue?

The doc does not specify what error is ignored or any error should be ignored..Clarifying this will help us to know if anything related to how cloneset validating webhook should return error or nothing needs to be done, do you agree?

Maybe we can get more insights form the apiserver code? Are you familiar with the apiserver code where it does the behavior? an error calling the webhook is ignored and the API request is allowed to continue? searching code...

FillZpp · 2021-11-09T07:19:07Z

It is the error for HTTPS calling, but not the rejection in response.

You have to know that, when webhook rejects a requst, the HTTPS response code is still 200, but the message is in the response body, such as:

{
    "allowed": false,
    "status": {
        "code": 400,
        "message": "spec.template.spec.restartPolicy: Unsupported value ......"
    }
}

shuyingliang · 2021-11-09T07:25:17Z

OK following your logic, in the case of failurePolicy is Ignore, shouldn't the HTTPS response be like this?

{
    "allowed": true,
    "status": {
        "code": 200,
        "message": "spec.template.spec.restartPolicy: Unsupported value ......"
    }
}

FillZpp · 2021-11-09T07:27:38Z

No, webhook itself does not care of the failurePolicy. It is only used by kube-apiserver to ignore failure HTTPS calling, to skip validation when webhook is abnormal.

shuyingliang · 2021-11-09T07:39:30Z

Understood. But back to the original ask, why does the request is rejected by apisever (basically no pod can be created) when the specifies the Cloneset validating webhook policy failurePolicy is Ignore? what do you suggest at this point?

FillZpp · 2021-11-09T07:52:24Z

I don't quite understand your question. As i said, even the failurePolicy is Ignore, kube-apiserver get the 200 response from webhook and use the status code 400 as the return code to user.

shuyingliang · 2021-11-09T08:27:27Z

As i said, even the failurePolicy is Ignore, kube-apiserver get the 200 response from webhook and use the status code 400 as the return code to user.

Thanks for the summary.
IIUC, given we have the situation where no pods are created in cloneset even with the webhook configuration failurePolicy as Ignore, you think there is nothing needs to be done from cloneset side to address the situation?

FillZpp · 2021-11-09T08:33:52Z

I'm not sure whether we have to provide a new feature-gate that allows users to create CloneSet with unlimited restartPolicy. In your case, is that some of the containers in Pod will exit while other containers always remain running?

shuyingliang · 2021-11-09T17:08:27Z

@FillZpp Yes. The use case is that not all of the applications is long running services, some colocated containers in the same CloneSet pod will do some stuff and exit.

Per container restartPolicy might be expensive to support (?), but the feature gate seems to be a great idea to enable the flexibility. Can you enable the CloneSet to do that?

veophi · 2021-11-10T07:15:13Z

@wgitg If a Pod with RestartPolicy=OnFailure, its behavior about restart will be as follows:

For the containers that have been exited and exitCode=0, inplace update will not be triggered via replacing its image ;
For the running containers, whether the inplace update can be triggered depends on the exitCode of the container when the container is stopped. If exitCode=0 , the inplace update will not be triggered as well (the container will just be killed, and not be recreated).

FillZpp · 2021-11-10T07:21:31Z

@wgitg If you will use the InPlace Update feature, please note the comment before. Could you accept that?

Besides, can those colocated containers define as init containers? So you don't have to set onFailure restartPolicy.

shuyingliang · 2021-11-10T07:41:45Z

@veophi @FillZpp Thanks for the above clarification! It is a surprise that onFailure can affect the InPlace Update feature.
:(

Yes, we do want to leverage the InPlace Update feature for all containers in the same pod. In fact, part of the reason we did not do it in the init container because the in-place update only works in app containers, and we did not want the changes in the init containers from the colocated short running app cause the entire pod to be recreated thus affecting the main applications in the same pod.

However, I am still pondering why the complications of mixing the flow of in-place update with the pod policy like this, because in-place update is about the restarting containers to deploy a new version, it should happen irrespective of how container exit before, shouldn't it?

stale · 2022-02-08T08:21:38Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

shuyingliang · 2022-02-10T06:51:44Z

Given the complication been discussed, we think the alternative proposal can help us to truly achieve the in-place update. We may close this issue.

FillZpp · 2022-02-10T11:23:50Z

closed by #901

shuyingliang added the kind/bug Something isn't working label Nov 9, 2021

shuyingliang assigned FillZpp Nov 9, 2021

FillZpp mentioned this issue Nov 10, 2021

Add featureGate to enable restartPolicy onFailure for CloneSet #799

Closed

stale bot added the wontfix This will not be worked on label Feb 8, 2022

stale bot removed the wontfix This will not be worked on label Feb 10, 2022

FillZpp closed this as completed Feb 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] CloneSet validating webhook requests pod with restartPolicy `onFailure` even though it is supported #794

[BUG] CloneSet validating webhook requests pod with restartPolicy `onFailure` even though it is supported #794

shuyingliang commented Nov 9, 2021 •

edited

Loading

FillZpp commented Nov 9, 2021

shuyingliang commented Nov 9, 2021 •

edited

Loading

FillZpp commented Nov 9, 2021

shuyingliang commented Nov 9, 2021 •

edited

Loading

FillZpp commented Nov 9, 2021

shuyingliang commented Nov 9, 2021

FillZpp commented Nov 9, 2021

shuyingliang commented Nov 9, 2021 •

edited

Loading

FillZpp commented Nov 9, 2021

shuyingliang commented Nov 9, 2021

FillZpp commented Nov 9, 2021

shuyingliang commented Nov 9, 2021

FillZpp commented Nov 9, 2021

shuyingliang commented Nov 9, 2021

FillZpp commented Nov 9, 2021 •

edited

Loading

shuyingliang commented Nov 9, 2021 •

edited

Loading

veophi commented Nov 10, 2021 •

edited

Loading

FillZpp commented Nov 10, 2021

shuyingliang commented Nov 10, 2021

stale bot commented Feb 8, 2022

shuyingliang commented Feb 10, 2022

FillZpp commented Feb 10, 2022

[BUG] CloneSet validating webhook requests pod with restartPolicy onFailure even though it is supported #794

[BUG] CloneSet validating webhook requests pod with restartPolicy onFailure even though it is supported #794

Comments

shuyingliang commented Nov 9, 2021 • edited Loading

FillZpp commented Nov 9, 2021

shuyingliang commented Nov 9, 2021 • edited Loading

FillZpp commented Nov 9, 2021

shuyingliang commented Nov 9, 2021 • edited Loading

FillZpp commented Nov 9, 2021

shuyingliang commented Nov 9, 2021

FillZpp commented Nov 9, 2021

shuyingliang commented Nov 9, 2021 • edited Loading

FillZpp commented Nov 9, 2021

shuyingliang commented Nov 9, 2021

FillZpp commented Nov 9, 2021

shuyingliang commented Nov 9, 2021

FillZpp commented Nov 9, 2021

shuyingliang commented Nov 9, 2021

FillZpp commented Nov 9, 2021 • edited Loading

shuyingliang commented Nov 9, 2021 • edited Loading

veophi commented Nov 10, 2021 • edited Loading

FillZpp commented Nov 10, 2021

shuyingliang commented Nov 10, 2021

stale bot commented Feb 8, 2022

shuyingliang commented Feb 10, 2022

FillZpp commented Feb 10, 2022

[BUG] CloneSet validating webhook requests pod with restartPolicy `onFailure` even though it is supported #794

[BUG] CloneSet validating webhook requests pod with restartPolicy `onFailure` even though it is supported #794

shuyingliang commented Nov 9, 2021 •

edited

Loading

shuyingliang commented Nov 9, 2021 •

edited

Loading

shuyingliang commented Nov 9, 2021 •

edited

Loading

shuyingliang commented Nov 9, 2021 •

edited

Loading

FillZpp commented Nov 9, 2021 •

edited

Loading

shuyingliang commented Nov 9, 2021 •

edited

Loading

veophi commented Nov 10, 2021 •

edited

Loading