Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERROR scalehandler Error getting scale decision #3903

Closed
aradhell opened this issue Nov 23, 2022 · 33 comments
Closed

ERROR scalehandler Error getting scale decision #3903

aradhell opened this issue Nov 23, 2022 · 33 comments
Labels
bug Something isn't working

Comments

@aradhell
Copy link

aradhell commented Nov 23, 2022

Report

ERROR	scalehandler	Error getting scale decision	{"scaledobject.Name": "scaled-object-x-touch", "scaledObject.Namespace": "x", "scaleTarget.Name": "x-touch", "error": "error resolving secrets for ScaleTarget: error reading config ref &ConfigMapEnvSource{LocalObjectReference:LocalObjectReference{Name:config-map-x-6665h6bgt4,},Optional:nil,} on namespace x/: ConfigMap \"config-map-x-6665h6bgt4\" not found"}

Expected Behavior

scale target

Actual Behavior

cant scale because of configmap not found error.

Steps to Reproduce the Problem

  1. change configmap yaml and redeploy

Logs from KEDA operator

ERROR	scalehandler	Error getting scale decision	{"scaledobject.Name": "scaled-object-x-touch", "scaledObject.Namespace": "x", "scaleTarget.Name": "x-touch", "error": "error resolving secrets for ScaleTarget: error reading config ref &ConfigMapEnvSource{LocalObjectReference:LocalObjectReference{Name:config-map-x-6665h6bgt4,},Optional:nil,} on namespace x/: ConfigMap \"config-map-x-6665h6bgt4\" not found"}

KEDA Version

2.7.1

Kubernetes Version

< 1.23

Platform

Amazon Web Services

Scaler Details

rabbitmq

Anything else?

@aradhell aradhell added the bug Something isn't working label Nov 23, 2022
@tomkerkhove tomkerkhove moved this to Proposed in Roadmap - KEDA Core Nov 23, 2022
@JorTurFer
Copy link
Member

Hello @aradhell ,
That error points to an error in the workload configuration. KEDA is trying to recover that configMap to check the environment variables but the configMap doesn't exist in the namespace, producing the error and failing. Is it optional?

@aradhell
Copy link
Author

Hi @JorTurFer

We put static env variables to configmaps to share between deployments.

Configmaps are sometimes being updated (when we add a new env var) by Cd tool and name is changing, I guess instead of update, new one is being created.

@JorTurFer
Copy link
Member

The problem is there, KEDA can't read the confiMap because it doesn't exist during the loop. KEDA executes reconciliation loops every X seconds, so once your confiMap is there, KEDA should continue without more errors.
Is this correct? Does KEDA work after that?

@aradhell
Copy link
Author

Hi @JorTurFer , it is always trying to find this specific configmap with same id. But it is being deleted and redeployed so name changes but Keda keeps looking for deleted one, so error never ends and scale doesnt work

@JorTurFer
Copy link
Member

I'd need to replicate the behaviour to troubleshoot it... Do you need to recreate the configMap with a different name for any specific reason? Are you doing it to ensure that pod is recreated on any change in the configMap?

@aradhell
Copy link
Author

@JorTurFer hello,

We use fluxcd, it recreates configmap when we send a change so I guess yes.

@JorTurFer
Copy link
Member

We use fluxcd, it recreates configmap when we send a change so I guess yes.

Do you mean that you use the configMap name change to recreate the pod?

@aradhell
Copy link
Author

Yes @JorTurFer I guess CD tool does this. I delete scaledobjects to recreate them with correct configmap name as a workaround

kubectl get scaledobject -n x --no-headers=true | awk '/scaled-object/{print $1}' | xargs kubectl delete -n x scaledobject

@JorTurFer
Copy link
Member

JorTurFer commented Nov 24, 2022

I wouldn't use that approach because there is another easier approach: add the checksum of the configMap as annotation in the pod template. Any change in pod template will recreate the pods, reloading the configMap without requiring any naming change.
do you use helm by chance? if you are using helm, there is an example in helm docs

@aradhell
Copy link
Author

Thanks for suggestion and your help, unfortunately we don't use helm. Can I help you to reproduce it in anyway / or with anything?

@JorTurFer
Copy link
Member

JorTurFer commented Nov 24, 2022

Just to confirm this, are you using KEDA v2.7.1?
you can reproduce the same behavior with raw manifests or kustomize, basically you need to annotate the pod template, you can do it dynamically where you parse and generate the configMap or just adding the annotation manually to the pod template.
The point here is that I don't think that this is a KEDA issue because I feel that the issue is with not existing resources which are linked.
Basically, KEDA doesn't cache the scaler if there is an error during the generation, so once your CD system creates the configMap, KEDA should be able to get it and the error should disappear

@aradhell
Copy link
Author

aradhell commented Nov 24, 2022

Yes but it doesn't we already have the new configmap in the resources. So if I need to give an example process

configmap is in the cluster. named as

we changed the configmap add/remove env etc. redeployed as configmap-2dhs1,

KEDA is still loking for configmap-1fgj1

And if I remove the scaledobject and re-create, it works fine

@JorTurFer
Copy link
Member

I have just tried this:

  1. I deployed a configMap (demo-1) and a deployment using it as envFrom
  2. I deployed another configMap (demo-2), removed the first one and I changed the deployment to use the new configMap

I can't see any error in KEDA and it's taking values from the new configMap without any other change.
Could you confirm what KEDA version you are using?

@JorTurFer
Copy link
Member

JorTurFer commented Nov 24, 2022

Going deeper, could you share KEDA operator logs where the error is shown, and also the output of kubectl describe deploy YOUR_DEPLOYMENT_NAME -n YOUR_NAMESPACE during the errors please?

@aradhell
Copy link
Author

@JorTurFer deployment output of keda or my deployment? Yes it is 2.7.1

@JorTurFer
Copy link
Member

I'd like to see the description of your deployment during the problems, to check if KEDA is getting the correct pod template or not, because AFAI can see, we get the pod template on every cycle, it's just to double-check the situation. I mean, if we see that deployment is still using the old configMap meanwhile KEDA has errors, that's the problem. It's difficult to figure out the root cause in other case because I haven't been able to reproduce the issue :(

@aradhell
Copy link
Author

Oh I already checked it, deployment had the new configmap. %100 sure.
I just deleted all scaledObjects, I will add the deployment output when I notice the error again.

@JorTurFer
Copy link
Member

JorTurFer commented Nov 24, 2022

Is there any way to reproduce your scenario from scratch? I tried and I couldn't reproduce it

@pedro-stanaka
Copy link
Contributor

pedro-stanaka commented Jan 16, 2023

I am facing this same issue, in my case the workloads depends on some secrets that might or might not be there since they are generated by a tool.

The error I see on keda:

"error resolving secrets for ScaleTarget: error resolving secret name &SecretKeySelector{LocalObjectReference:LocalObjectReference{Name:my-secret-name,},API_KEY,Optional:*true,} for env API_KEY in namespace myapp-production"

Interesting enough, you can even see that it has Optional as true.

I am running:

2023-01-16T14:15:44Z    INFO    setup   Running on Kubernetes 1.22      {"version": "v1.22.15"}
2023-01-16T14:15:44Z    INFO    setup   Starting manager
2023-01-16T14:15:44Z    INFO    setup   KEDA Version: 2.8.1
2023-01-16T14:15:44Z    INFO    setup   Git Commit: 12783c1340a13ba6776d1d6a64127d88e7828aab

The steps I followed to reproduce:

Apply the v1/Deployment, deploy KEDA, create ScaledObject.

@JorTurFer
Copy link
Member

Hi @pedro-stanaka ,
That problem has been solved in v2.9 thanks to #3694
The title says ScaledJob but I think it solves both as the code is shared

@pedro-stanaka
Copy link
Contributor

Can you back port to 2.8? I can't use 2.9 because i am on 1.22 k8s. I can submit pr if possible.

@JorTurFer
Copy link
Member

WDYT @kedacore/keda-core-contributors ?

@pedro-stanaka
Copy link
Contributor

Hi, just wanted to know what is time window I can expect for an answer here? Otherwise, I will be running a fork of the operator for now (😞), as this is blocking me from using KEDA in production.

I wanted to cherry pick that commit of yours to a release branch, but I see that the release branch for 2.8.x does not even exist.

Thanks for the tool and for the work put into it btw.

@zroubalik
Copy link
Member

To be honest, I am not a big fan of releasing another 2.8 release, because I think that we would need to fix security issues as well, to not have a release with known CVEs.

But I completely understand your needs. If running a fork is okay for you, then it might be easier solution short term.

But let see what others think.

@pedro-stanaka
Copy link
Contributor

pedro-stanaka commented Jan 17, 2023

Actually, the fork was not a solution, but rather a stop-gap (temporary) remediation to the problem. Is there any policy in place on what versions get patches and CVE fixes? I imagine a lot of users still are using 2.8 (or even older versions).

@pedro-stanaka
Copy link
Contributor

pedro-stanaka commented Jan 17, 2023

Can I deploy 2.9 in K8S 1.22, even if it is not supported? What would be the behavior here then? I have a mix of clusters, with versions ranging from 1.22 to 1.24.

@JorTurFer
Copy link
Member

JorTurFer commented Jan 17, 2023

Can I deploy 2.9 in K8S 1.22, even if it is not supported? What would be the behavior here then? I have a mix of clusters, with versions ranging from 1.22 to 1.24.

No you can't. There is a breaking change in Kubernetes APIs, v2.9 has migrated from autoscaling/v2beta2 to autoscaling/v2. The behavior in k8s <1.23 will be that KEDA failing during the startup with a message saying that autoscaling/v2 isn't available

@JorTurFer
Copy link
Member

Actually, the fork was not a solution, but rather a stop-gap (temporary) remediation to the problem. Is there any policy in place on what versions get patches and CVE fixes? I imagine a lot of users still are using 2.8 (or even older versions).

Usually we don't patch any released version. We have started with release branches in v2.9 to have this option in the future, but atm we haven't done it yet (that's why you can't see v2.8 release branch).
I understand your requirement because there is supported versions pruning in v2.9, but I agree with @zroubalik and I'm not totally sure about the side effects or the required effort for doing that.
Let's wait other @kedacore/keda-core-contributors opinions

@JorTurFer
Copy link
Member

Exceptionally, we are going to release a hotfix release from branch v2.8 due to the braking change introduced in the autoscaling api.
Are you open to help us with this? We need to identify all the fixes that we need to back port and then release them

@JorTurFer
Copy link
Member

JorTurFer commented Jan 17, 2023

I have created this issue to track all the fixes to be ported, we need to identify and port them

@pedro-stanaka
Copy link
Contributor

I can definitely help with keda itself, i rather not tinker around Helm since I dont use it and don't have much knowledge about it.

@JorTurFer
Copy link
Member

I can definitely help with keda itself, i rather not tinker around Helm since I dont use it and don't have much knowledge about it.

No worries, helm changes need to be done during the release process :)

@JorTurFer
Copy link
Member

Fixed by #3694

@github-project-automation github-project-automation bot moved this from Proposed to Ready To Ship in Roadmap - KEDA Core Jan 18, 2023
@JorTurFer JorTurFer moved this from Ready To Ship to Done in Roadmap - KEDA Core Mar 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Archived in project
Development

No branches or pull requests

4 participants