add “cat” to base image - adds the ability to query missing cert file #4109

tshaiman · 2023-01-15T10:28:11Z

Proposal

following the issue where azure identity webhook did not mutate keda pods during node replacement
we need a better health check mechanism to enforce pod restart in case the cert is missing

today we cannot do healthcheck based on linux cat command as it is not part of the distroless base image .

please consider adding cat to the distroless image :

—copy-from busybox:xxx /bin/cat /bin/cat

Use-Case

No response

Is this a feature you are interested in implementing yourself?

No

Anything else?

#3977

tomkerkhove · 2023-01-16T08:18:46Z

Thoughts @kedacore/keda-maintainers ?

JorTurFer · 2023-01-17T08:14:01Z

I don't like the idea TBH, using cat an external user can get internal information, we use distroless image for reducing as much as possible the attack vector, introducing executables for edge cases doesn't make sense IMO, we should discuss how to execute a better probes based on the resources we are managing, but I wouldn't add any extra binary to the image, but I won't block this if other folks think that adding it is worth

tshaiman · 2023-01-17T08:48:42Z

we need something .
empty healthy check and disability to use our own check whether the cert exists cause our usage of Keda with all the Workload Identity issues described in other tickets as non-production grade product .

zroubalik · 2023-01-17T09:25:29Z

I agree with @JorTurFer on this, we should keep the image minimal as much as possible. @tshaiman but I think I can say that we are open to other suggestions.

tshaiman · 2023-01-17T10:38:14Z

as mentioned in previous tickets - we need a reliable health check . you don’t agree to add logic to the health check in code , and you don’t to allow us to inject the health check file with cat
in other words , keda contributors rather the health probe to always return “healthy” even when it’s not .
i have no other suggestions than to change this policy which is totally wrong in terms of reliability ( reporting something is healthy while it’s not)

zroubalik · 2023-01-17T11:34:31Z

you don’t agree to add logic to the health check in code

where we don't agree on this?

JorTurFer · 2023-01-17T11:37:17Z

I have some points about that. KEDA works, providing the external variables is responsibility of a 3rd party. We only said that we don't like the idea of adding extra binaries to the image only for solving an edge case. I also said that adding these values checking as part of the KEDA probes is dangerous because in your use case it's critical, but other use cases don't require it and other users can prefer to fail only in 1 ScaledObject than failing all the time in case of persistent errors related with WI because if the trouble is not transitory, KEDA won't work because it and maybe there are 1 or 2 ScaledObjects with that dependency and a huge amount without it.

I'm not against adding a dynamic probes inside KEDA based for example in a new parameter in the TriggerAuthentication, or as a general parameter (maybe better as general).

But I'd like to add that IMHO this issue should be fixed (and can be, indeed) in 3rd party side. K8s webhooks have a property named failurePolicy with 2 possible options:

Ignore: The validation/mutation will continue without any error if the webhooks server isn't available (current case, producing the fail)
Fail: The validation/mutation will fail webhooks server isn't available (blocking KEDA scheduling without the needed resources)

IDK if the managed WI add-on supports it or not but helm does it. Setting that value to Fail with a proper namespaceSelector to avoid locks (because the webhooks server can't be scheduled with itself working) can fix the issue better than adding a probe to restart KEDA if WI fails

tshaiman · 2023-01-17T11:45:09Z

@JorTurFer : I agree the root cause is with the failurePolicy and as far as I heard from WIF people, the fix is on its way.
my remark is more of a general one : in cloud native apps , you need to allow customer to inject their own health -checks
and if we can't do that , this is a bit of violation of the contract we have with k8s.
adding "cat" utility should not break the security barriers if we only add "cat" and not /bin/sh

JorTurFer · 2023-01-17T22:44:23Z

AFAIK, cat can execute arbitrary code, if an attacker mounts a compromised file and read it with cat, they could execute arbitrary code in the pod, with all the privileges that the pod has (usually, the pod has privileged access to a lot of places).

WRT the option of injecting custom probes, most probably I'm wrong, but I have just checked some helm charts in artifact hub (the top 3) and these are the results:

kube-prometheus-stack: Doesn't support any change in probes
cert-manager: Only supports the configuration of the general values like timeout or threshold. probe type and path are locked
nginx: supports changes in all the values, except the type, it's always httpGet. (only for controller, default backend is restricted only to general values like timeout and threshold)

I have checked only the top 3, maybe other charts are more flexible about the probes, but even if we supported the most flexible case of them (any parameter except the probe type) , it wouldn't be enough for your use case as you need a change from httpGet to exec.

I still think that adding a parameter for enabling a check as part of the already existing health check is better. Maybe as a first approach, we can just set an environment variable (helm allows setting extra envs given by user) and if that env is true, we can add the check in the current health checks. This can give you the option to restart KEDA if you think that it's critical, and other users have the option to don't do it if they think the opposite.
Something like PROBE_AAD_IDENTITYI, PROBE_AWS_IDENTITY and PROBE_GCP_IDENTITY for adding those 3 checks conditionally to KEDA checks (Aad-Pod-Identity can't be tested in this way AFAIK because it doesn't inject anything, only federated authentications do it)

WDYT about this @kedacore/keda-core-contributors ?

tomkerkhove · 2023-01-25T09:39:43Z

I can see why we want to stick with distroless image; but let's see how we can otherwise help with docs and guidance.

For health checks; I think it depends on what we want to achieve here and what defines "healthy". Maybe a side-car approach can be feasible.

tshaiman · 2023-01-25T10:02:14Z

a side car cannot assist here as if it will detect the missing cert and report in healthy - that would cause the side car container to restart and not the keda container to restart

tshaiman · 2023-01-25T10:03:01Z

keda unhealthy == it cannot authenticate / cannot access the token file

that i is good enough KPI in my opinion

zroubalik · 2023-01-25T10:12:04Z

I think this check should be independent of KEDA deployment. The probes should inform about health condition of KEDA operator, not one of scalers or auth providers.

This way on a shared cluster, one user can bring KEDA down, just by using incorrect auth settings.

We should provide additional ways how to inform about this. Maybe a status on TriggerAuth? Or a new resource that will report the status?

tomkerkhove · 2023-01-25T11:07:22Z

We should provide additional ways how to inform about this.

I think this is the sweet spot combined with #3764.

Then you can both alert based on logs + event on the TriggerAuthentication resource's events

tomkerkhove · 2023-01-25T11:07:38Z

Actually this is already tracked in #3629

zroubalik · 2023-01-25T12:01:43Z

@tshaiman are you willing to draft a proposal using approach mentioned above?

stale · 2023-03-26T12:11:25Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

stale · 2023-04-02T21:55:39Z

This issue has been automatically closed due to inactivity.

stale · 2023-06-08T12:42:04Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

stale · 2023-06-15T13:06:11Z

This issue has been automatically closed due to inactivity.

tshaiman added feature-request All issues for new features that have not been committed to needs-discussion labels Jan 15, 2023

tomkerkhove added this to Roadmap - KEDA Core Jan 15, 2023

github-project-automation bot moved this to Proposed in Roadmap - KEDA Core Jan 15, 2023

stale bot added the stale All issues that are marked as stale due to inactivity label Mar 26, 2023

stale bot closed this as completed Apr 2, 2023

github-project-automation bot moved this from Proposed to Ready To Ship in Roadmap - KEDA Core Apr 2, 2023

JorTurFer reopened this Apr 3, 2023

github-project-automation bot moved this from Ready To Ship to Proposed in Roadmap - KEDA Core Apr 3, 2023

stale bot removed the stale All issues that are marked as stale due to inactivity label Apr 3, 2023

stale bot added the stale All issues that are marked as stale due to inactivity label Jun 8, 2023

stale bot closed this as completed Jun 15, 2023

github-project-automation bot moved this from Proposed to Ready To Ship in Roadmap - KEDA Core Jun 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add “cat” to base image - adds the ability to query missing cert file #4109

add “cat” to base image - adds the ability to query missing cert file #4109

tshaiman commented Jan 15, 2023

tomkerkhove commented Jan 16, 2023

JorTurFer commented Jan 17, 2023

tshaiman commented Jan 17, 2023

zroubalik commented Jan 17, 2023

tshaiman commented Jan 17, 2023

zroubalik commented Jan 17, 2023

JorTurFer commented Jan 17, 2023 •

edited

Loading

tshaiman commented Jan 17, 2023

JorTurFer commented Jan 17, 2023 •

edited

Loading

tomkerkhove commented Jan 25, 2023

tshaiman commented Jan 25, 2023

tshaiman commented Jan 25, 2023

zroubalik commented Jan 25, 2023

tomkerkhove commented Jan 25, 2023

tomkerkhove commented Jan 25, 2023

zroubalik commented Jan 25, 2023

stale bot commented Mar 26, 2023

stale bot commented Apr 2, 2023

stale bot commented Jun 8, 2023

stale bot commented Jun 15, 2023

add “cat” to base image - adds the ability to query missing cert file #4109

add “cat” to base image - adds the ability to query missing cert file #4109

Comments

tshaiman commented Jan 15, 2023

Proposal

Use-Case

Is this a feature you are interested in implementing yourself?

Anything else?

tomkerkhove commented Jan 16, 2023

JorTurFer commented Jan 17, 2023

tshaiman commented Jan 17, 2023

zroubalik commented Jan 17, 2023

tshaiman commented Jan 17, 2023

zroubalik commented Jan 17, 2023

JorTurFer commented Jan 17, 2023 • edited Loading

tshaiman commented Jan 17, 2023

JorTurFer commented Jan 17, 2023 • edited Loading

tomkerkhove commented Jan 25, 2023

tshaiman commented Jan 25, 2023

tshaiman commented Jan 25, 2023

zroubalik commented Jan 25, 2023

tomkerkhove commented Jan 25, 2023

tomkerkhove commented Jan 25, 2023

zroubalik commented Jan 25, 2023

stale bot commented Mar 26, 2023

stale bot commented Apr 2, 2023

stale bot commented Jun 8, 2023

stale bot commented Jun 15, 2023

JorTurFer commented Jan 17, 2023 •

edited

Loading

JorTurFer commented Jan 17, 2023 •

edited

Loading