Scaling up not work right on Virtual Kubelet #906

3d0it · 2020-07-01T10:52:57Z

A deployment with replicas = 0 scale up to 1 not only when the threshold is reached, but when the metrics moves from 0 to 1
(ONLY ON Virtual Kubelet, on phisical nodes works).

Expected Behavior

Scale up the deployment when the threshold is reached, not when the metrics change from 0 to 1

Actual Behavior

Deployment scaled up when metrics query return 1

Steps to Reproduce the Problem

Create a deployment with min replica count = 0
Write a simple query with a threshold greather than 1 (for example 2)
When the query returns 1 instead of 0, a new istance of the deployment is created

I attached my yaml file. In my scenario, I have a deployment with only a replica (deployment name pushengineweb) on physical nodes.
When a custom Prometheus query (a simple count of connected client) reach the threshold, I want to create a new pod on the Virtual Kubelet (deployment name pushengineweb-scale).

The same configuration, without scale on the Virtual Kubelet but on physical nodes, works fine.

Specifications

KEDA Version: 1.4.1
Platform & Version: Azure AKS
Kubernetes Version: 1.16.9
Scaler(s): Prometheus - chart prometheus-operator-8.10.0 app version 0.36.0
example.txt

tomkerkhove · 2020-07-01T11:08:29Z

Hm we don't control what Virtual Kubelet does, we merely handle 0->1 scaling and the rest is up to the Kubernetes HPA. So if VK decides to kick in, that's because he made that decision.

Can you share the logs from KEDA please?

3d0it · 2020-07-01T13:06:19Z

Hi, this is the last part of the log. I recreate the case described in the issue few minute ago:

W0630 13:33:41.429075       1 reflector.go:289] pkg/mod/k8s.io/[email protected]+incompatible/tools/cache/reflector.go:94: watch of *unstructured.Unstructured ended with: too old resource version: 23416111 (23417551)
W0630 14:04:11.452836       1 reflector.go:289] pkg/mod/k8s.io/[email protected]+incompatible/tools/cache/reflector.go:94: watch of *unstructured.Unstructured ended with: too old resource version: 23421646 (23422593)
W0630 14:33:40.468099       1 reflector.go:289] pkg/mod/k8s.io/[email protected]+incompatible/tools/cache/reflector.go:94: watch of *unstructured.Unstructured ended with: too old resource version: 23426687 (23427430)
W0630 14:59:45.483728       1 reflector.go:289] pkg/mod/k8s.io/[email protected]+incompatible/tools/cache/reflector.go:94: watch of *unstructured.Unstructured ended with: too old resource version: 23431563 (23431769)
W0630 15:26:03.497028       1 reflector.go:289] pkg/mod/k8s.io/[email protected]+incompatible/tools/cache/reflector.go:94: watch of *unstructured.Unstructured ended with: too old resource version: 23435863 (23436111)
{"level":"info","ts":1593532066.1006677,"logger":"controller_scaledobject","msg":"Reconciling ScaledObject","Request.Namespace":"default","Request.Name":"pushengineweb-scaledobject"}
{"level":"info","ts":1593532066.1007018,"logger":"controller_scaledobject","msg":"Detected ScaleType = Deployment","Request.Namespace":"default","Request.Name":"pushengineweb-scaledobject"}
{"level":"info","ts":1593532355.8509173,"logger":"controller_scaledobject","msg":"Reconciling ScaledObject","Request.Namespace":"default","Request.Name":"pushengineweb-scaledobject"}
{"level":"info","ts":1593532355.8509517,"logger":"controller_scaledobject","msg":"Detected ScaleType = Deployment","Request.Namespace":"default","Request.Name":"pushengineweb-scaledobject"}
{"level":"info","ts":1593558317.7918286,"logger":"controller_scaledobject","msg":"Reconciling ScaledObject","Request.Namespace":"default","Request.Name":"pushengineweb-scaledobject"}
{"level":"info","ts":1593558317.7918668,"logger":"controller_scaledobject","msg":"Detected ScaleType = Deployment","Request.Namespace":"default","Request.Name":"pushengineweb-scaledobject"}
{"level":"info","ts":1593594317.7921321,"logger":"controller_scaledobject","msg":"Reconciling ScaledObject","Request.Namespace":"default","Request.Name":"pushengineweb-scaledobject"}
{"level":"info","ts":1593594317.7921786,"logger":"controller_scaledobject","msg":"Detected ScaleType = Deployment","Request.Namespace":"default","Request.Name":"pushengineweb-scaledobject"}
{"level":"info","ts":1593607721.0510402,"logger":"controller_scaledobject","msg":"Reconciling ScaledObject","Request.Namespace":"default","Request.Name":"pushengineweb-scaledobject"}
{"level":"info","ts":1593607721.051074,"logger":"controller_scaledobject","msg":"Detected ScaleType = Deployment","Request.Namespace":"default","Request.Name":"pushengineweb-scaledobject"}
{"level":"info","ts":1593607721.3310325,"logger":"controller_scaledobject","msg":"Updated HPA according to ScaledObject","Request.Namespace":"default","Request.Name":"pushengineweb-scaledobject","HPA.Namespace":"default","HPA.Name":"keda-hpa-pushengineweb-scale"}
{"level":"info","ts":1593607721.3315866,"logger":"controller_scaledobject","msg":"Reconciling ScaledObject","Request.Namespace":"default","Request.Name":"pushengineweb-scaledobject"}
{"level":"info","ts":1593607721.3316038,"logger":"controller_scaledobject","msg":"Detected ScaleType = Deployment","Request.Namespace":"default","Request.Name":"pushengineweb-scaledobject"}
{"level":"info","ts":1593607730.4269576,"logger":"controller_scaledobject","msg":"Reconciling ScaledObject","Request.Namespace":"default","Request.Name":"pushengineweb-scaledobject"}
{"level":"info","ts":1593607730.4270058,"logger":"controller_scaledobject","msg":"Detected ScaleType = Deployment","Request.Namespace":"default","Request.Name":"pushengineweb-scaledobject"}
{"level":"info","ts":1593607826.3706076,"logger":"scalehandler","msg":"Successfully updated deployment","ScaledObject.Namespace":"default","ScaledObject.Name":"pushengineweb-scaledobject","ScaledObject.ScaleType":"deployment","Deployment.Namespace":"default","Deployment.Name":"pushengineweb-scale","Original Replicas Count":0,"New Replicas Count":1}
{"level":"info","ts":1593607837.8436048,"logger":"controller_scaledobject","msg":"Reconciling ScaledObject","Request.Namespace":"default","Request.Name":"pushengineweb-scaledobject"}
{"level":"info","ts":1593607837.8436441,"logger":"controller_scaledobject","msg":"Detected ScaleType = Deployment","Request.Namespace":"default","Request.Name":"pushengineweb-scaledobject"}

Thanks

3d0it · 2020-07-03T07:17:07Z

Hello, I also add the HPA logs.

If I scale on physical node the "Condition section" isn't logged.

kubectl describe horizontalpodautoscaler.autoscaling/keda-hpa-pushengineweb-scale
Name:                                                    keda-hpa-pushengineweb-scale
Namespace:                                               default
Labels:                                                  app.kubernetes.io/managed-by=keda-operator
                                                         app.kubernetes.io/name=keda-hpa-pushengineweb-scale
                                                         app.kubernetes.io/part-of=pushengineweb-scaledobject
                                                         app.kubernetes.io/version=1.4.1
Annotations:                                             <none>
CreationTimestamp:                                       Fri, 03 Jul 2020 08:54:10 +0200
Reference:                                               Deployment/pushengineweb-scale
Metrics:                                                 ( current / target )
  "pushengine_connected_client" (target average value):  <unknown> / 2
Min replicas:                                            1
Max replicas:                                            4
Deployment pods:                                         0 current / 0 desired
Conditions:
  Type           Status  Reason             Message
  ----           ------  ------             -------
  AbleToScale    True    SucceededGetScale  the HPA controller was able to get the target's current scale
  ScalingActive  False   ScalingDisabled    scaling is disabled since the replica count of the target is zero
Events:          <none>

Thanks

3d0it · 2020-07-03T08:31:11Z

Hi, another info, if it could be useful, scaling down to zero according to this issue #524 works fine, only scale up from zero not works as expected

mchmarny · 2020-08-30T20:03:09Z

I can reliably reproduce this issue using this demo. Example log showing significant queue lag:

{"level":"debug","ts":1598799261.2208235,"logger":"kafka_scaler","msg":"Group autoscaling has a lag of 147410 for topic messages and partition 0\n"}
{"level":"debug","ts":1598799261.2208657,"logger":"scalehandler","msg":"Scaler for scaledObject is active","ScaledObject.Namespace":"default","ScaledObject.Name":"queue-outoscaling-scaler","ScaledObject.ScaleType":"deployment","Scaler":{}}
{"level":"debug","ts":1598799261.2310224,"logger":"scalehandler","msg":"ScaledObject's Status was properly updated","ScaledObject.Namespace":"default","ScaledObject.Name":"queue-outoscaling-scaler","ScaledObject.ScaleType":"deployment"}

And the HPA error:

W0830 14:54:22.057920       1 reflector.go:289] pkg/mod/k8s.io/[email protected]+incompatible/tools/cache/reflector.go:94: watch of *unstructured.Unstructured ended with: too old resource version: 1894736 (1894804)

tomkerkhove · 2020-08-30T20:13:39Z

Can we look into this for v2 @zroubalik @ahmelsayed?

zroubalik · 2020-08-31T09:45:00Z

For me it looks more like a problem in internal go client/libs and not in KEDA. I don't have an access to cluster with Virtual Kubelet, so I can't reproduce :(
But it might be solved in v2, since we have significantly updated client-go version.

mchmarny · 2020-08-31T15:29:34Z

Just to clarify, I'm able to reproduce this on a vanilla AKS cluster (v1.16.13 right now although I've seen this on v1.17 as well). I've put together a script to reproduce this issue here

yaron2 · 2020-08-31T17:25:04Z

@zroubalik @tomkerkhove this seems to reproduce on non vkubelet clusters as @mchmarny mentioned. I think this needs a 1.x minor release with the updated go client.

tomkerkhove · 2020-08-31T17:44:00Z

In that case we should be good to go with v2 as @zroubalik mentioned!

Stay tuned, beta coming soon. Would you mind testing this out then?

mchmarny · 2020-08-31T19:49:34Z

After the ScaledObject updates for v2 compatibility the error is no longer logged and HPA is successfully updated. Tested for 0-1, 1-n, n-1, 1-0 scenarios.

2020-08-31T19:29:35.146Z	INFO	controllers.ScaledObject	Reconciling ScaledObject	{"Request.Namespace": "default", "Request.Name": "queue-outoscaling-scaler"}
2020-08-31T19:29:35.205Z	INFO	controllers.ScaledObject	Updated HPA according to ScaledObject	{"Request.Namespace": "default", "Request.Name": "queue-outoscaling-scaler", "HPA.Namespace": "default", "HPA.Name": "keda-hpa-queue-outoscaling-scaler"}
2020-08-31T19:29:35.205Z	INFO	controllers.ScaledObject	Initializing Scaling logic according to ScaledObject Specification	{"Request.Namespace": "default", "Request.Name": "queue-outoscaling-scaler"}
2020-08-31T19:29:35.249Z	INFO	scaleexecutor	Successfully set ScaleTarget replicas count to ScaledObject minReplicaCount	{"scaledobject.Name": "queue-outoscaling-scaler", "scaledObject.Namespace": "default", "scaleTarget.Name": "queue-outoscaling-subscriber", "ScaleTarget.Replicas": 1}

BTW, can you add topic lag back in on ScaledObject reconciling? That log entry was super helpful in debugging in v1.5.

WRT signaling this to users, given this impacts everyone using Helm to deploy onto default k8s version in most CSPs right now, I do agree with @yaron2, 1.x update ASAP is needed. Additionally, may be worth indicating which version of k8s the current KEDA release works with reliably and point users to the deployment from head as a stopgap for anything after.

zroubalik · 2020-09-01T13:11:53Z

@mchmarny thanks for the confirmation! I am glad it is fixed.

BTW, can you add topic lag back in on ScaledObject reconciling? That log entry was super helpful in debugging in v1.5.

The log is still there, have you tried to enable debug log level? (FYI this was changed as well in v2: https://github.com/kedacore/keda#setting-log-levels

Updating go client libraries in v1 would be non trivial, we encourage users to migrate to v2 as soon as it is out. And as for the supported k8s version, users should be able to KEDA on k8s 1.13+, this problem is probably not in the kubernetes but in the client libraries, IMHO :)

mchmarny · 2020-09-14T02:20:26Z

I was excited to see Keda v2.0.0-beta come so I've retested this issue with the new version using. I'm still seeing same issue, Keda sees the correct lag but does not scale the target.

I've updated the demo so it's easier to reproduce. I'm seeing this on AKS Kubernetes version 1.16.13.

2020-09-14T02:05:52.300Z	INFO	controllers.ScaledObject	Initializing Scaling logic according to ScaledObject Specification	{"ScaledObject.Namespace": "default", "ScaledObject.Name": "queue-depth-scaler"}
2020-09-14T02:05:52.300Z	DEBUG	controllers.ScaledObject	ScaledObject is defined correctly and is ready for scaling	{"ScaledObject.Namespace": "default", "ScaledObject.Name": "queue-depth-scaler"}
2020-09-14T02:05:52.306Z	DEBUG	controller	Successfully Reconciled	{"reconcilerGroup": "keda.sh", "reconcilerKind": "ScaledObject", "controller": "scaledobject", "name": "queue-depth-scaler", "namespace": "default"}
2020-09-14T02:05:52.306Z	INFO	controllers.ScaledObject	Reconciling ScaledObject	{"ScaledObject.Namespace": "default", "ScaledObject.Name": "queue-depth-scaler"}
2020-09-14T02:05:52.306Z	DEBUG	controllers.ScaledObject	Parsed Group, Version, Kind, Resource	{"ScaledObject.Namespace": "default", "ScaledObject.Name": "queue-depth-scaler", "GVK": "apps/v1.Deployment", "Resource": "deployments"}
2020-09-14T02:05:52.345Z	DEBUG	controllers.ScaledObject	ScaledObject is defined correctly and is ready for scaling	{"ScaledObject.Namespace": "default", "ScaledObject.Name": "queue-depth-scaler"}
2020-09-14T02:05:52.347Z	DEBUG	kafka_scaler	Group autoscaling-subscriber has a lag of 508 for topic metrics and partition 0

2020-09-14T02:05:52.347Z	DEBUG	scalehandler	Scaler for scaledObject is active	{"Metrics Name": "kafka-metrics-autoscaling-subscriber"}
2020-09-14T02:05:52.351Z	DEBUG	controller	Successfully Reconciled	{"reconcilerGroup": "keda.sh", "reconcilerKind": "ScaledObject", "controller": "scaledobject", "name": "queue-depth-scaler", "namespace": "default"}
2020-09-14T02:05:52.364Z	DEBUG	scalehandler	Watching with pollingInterval	{"type": "ScaledObject", "namespace": "default", "name": "queue-depth-scaler", "PollingInterval": "10s"}
2020-09-14T02:06:02.413Z	DEBUG	kafka_scaler	Group autoscaling-subscriber has a lag of 507 for topic metrics and partition 0

tomkerkhove · 2020-09-14T04:08:10Z

Can you have a look please @ahmelsayed ?

@mchmarny can you share some info on what you are seeing? Does it stay with 0 instances or what happens?

zroubalik · 2020-09-14T08:42:50Z

@mchmarny would be great if you can take a look on KEDA Metrics Server logs, to check whether there is some problem

mchmarny · 2020-09-14T13:04:59Z

The configured HPA doesn't seem to take effect. I'm seeing Keda doing its 0-1 and 1-0 scaling on target just fine. As the lag increases however, correct queue lag is being logged by operator but the configured HPA does not scale the target beyond 1, even when its target is 3. This is all on AKS with Kubernetes version v1.16.13

Here is the scaled object I'm using to replicate this issue:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: subscriber-scaler
spec:
  scaleTargetRef:
    name: autoscaling-subscriber
  pollingInterval: 15
  minReplicaCount: 0
  maxReplicaCount: 10
  cooldownPeriod: 30
  triggers:
  - type: kafka
    metadata:
      topic: metrics
      bootstrapServers: kafka-cp-kafka.kafka.svc.cluster.local:9092
      consumerGroup: autoscaling-subscriber
      lagThreshold: "3"

Controller logs after I apply it:

2020-09-14T12:38:06.812Z	INFO	controllers.ScaledObject	Reconciling ScaledObject	{"ScaledObject.Namespace": "default", "ScaledObject.Name": "subscriber-scaler"}
2020-09-14T12:38:06.812Z	INFO	controllers.ScaledObject	Adding Finalizer for the ScaledObject	{"ScaledObject.Namespace": "default", "ScaledObject.Name": "subscriber-scaler"}
2020-09-14T12:38:06.837Z	DEBUG	controllers.ScaledObject	Adding scaledObjectName label on ScaledObject	{"ScaledObject.Namespace": "default", "ScaledObject.Name": "subscriber-scaler", "value": "subscriber-scaler"}
2020-09-14T12:38:06.843Z	DEBUG	controllers.ScaledObject	Parsed Group, Version, Kind, Resource	{"ScaledObject.Namespace": "default", "ScaledObject.Name": "subscriber-scaler", "GVK": "apps/v1.Deployment", "Resource": "deployments"}
2020-09-14T12:38:06.855Z	INFO	controllers.ScaledObject	Detected resource targeted for scaling	{"ScaledObject.Namespace": "default", "ScaledObject.Name": "subscriber-scaler", "resource": "apps/v1.Deployment", "name": "autoscaling-subscriber"}
2020-09-14T12:38:06.855Z	INFO	controllers.ScaledObject	Creating a new HPA	{"ScaledObject.Namespace": "default", "ScaledObject.Name": "subscriber-scaler", "HPA.Namespace": "default", "HPA.Name": "keda-hpa-subscriber-scaler"}
2020-09-14T12:38:06.888Z	DEBUG	controllers.ScaledObject	Notify scaleHandler of an update in scaledObject	{"ScaledObject.Namespace": "default", "ScaledObject.Name": "subscriber-scaler"}
2020-09-14T12:38:06.888Z	INFO	controllers.ScaledObject	Initializing Scaling logic according to ScaledObject Specification	{"ScaledObject.Namespace": "default", "ScaledObject.Name": "subscriber-scaler"}
2020-09-14T12:38:06.888Z	DEBUG	controllers.ScaledObject	ScaledObject is defined correctly and is ready for scaling	{"ScaledObject.Namespace": "default", "ScaledObject.Name": "subscriber-scaler"}
2020-09-14T12:38:06.895Z	DEBUG	controller	Successfully Reconciled	{"reconcilerGroup": "keda.sh", "reconcilerKind": "ScaledObject", "controller": "scaledobject", "name": "subscriber-scaler", "namespace": "default"}
2020-09-14T12:38:06.895Z	INFO	controllers.ScaledObject	Reconciling ScaledObject	{"ScaledObject.Namespace": "default", "ScaledObject.Name": "subscriber-scaler"}
2020-09-14T12:38:06.895Z	DEBUG	controllers.ScaledObject	Parsed Group, Version, Kind, Resource	{"ScaledObject.Namespace": "default", "ScaledObject.Name": "subscriber-scaler", "GVK": "apps/v1.Deployment", "Resource": "deployments"}
2020-09-14T12:38:06.938Z	DEBUG	controllers.ScaledObject	ScaledObject is defined correctly and is ready for scaling	{"ScaledObject.Namespace": "default", "ScaledObject.Name": "subscriber-scaler"}
2020-09-14T12:38:06.939Z	DEBUG	kafka_scaler	Group autoscaling-subscriber has a lag of 0 for topic metrics and partition 0
2020-09-14T12:38:06.944Z	DEBUG	controller	Successfully Reconciled	{"reconcilerGroup": "keda.sh", "reconcilerKind": "ScaledObject", "controller": "scaledobject", "name": "subscriber-scaler", "namespace": "default"}
2020-09-14T12:38:06.948Z	INFO	scaleexecutor	Successfully scaled ScaleTarget to 0 replicas	{"scaledobject.Name": "subscriber-scaler", "scaledObject.Namespace": "default", "scaleTarget.Name": "autoscaling-subscriber"}
2020-09-14T12:38:06.955Z	DEBUG	scalehandler	Watching with pollingInterval	{"type": "ScaledObject", "namespace": "default", "name": "subscriber-scaler", "PollingInterval": "15s"}

And the created HPA

Name:                                                             keda-hpa-subscriber-scaler
Namespace:                                                        default
Labels:                                                           app.kubernetes.io/managed-by=keda-operator
                                                                  app.kubernetes.io/name=keda-hpa-subscriber-scaler
                                                                  app.kubernetes.io/part-of=subscriber-scaler
                                                                  app.kubernetes.io/version=v2
                                                                  scaledObjectName=subscriber-scaler
Annotations:                                                      <none>
CreationTimestamp:                                                Mon, 14 Sep 2020 05:38:06 -0700
Reference:                                                        Deployment/autoscaling-subscriber
Metrics:                                                          ( current / target )
  "kafka-metrics-autoscaling-subscriber" (target average value):  <unknown> / 3
Min replicas:                                                     1
Max replicas:                                                     10
Deployment pods:                                                  0 current / 0 desired
Conditions:
  Type            Status  Reason              Message
  ----            ------  ------              -------
  AbleToScale     True    SucceededGetScale   the HPA controller was able to get the target's current scale
  ScalingActive   False   ScalingDisabled     scaling is disabled since the replica count of the target is zero
  ScalingLimited  False   DesiredWithinRange  the desired count is within the acceptable range
Events:           <none>

And when the lag increases:

2020-09-14T12:41:22.586Z	DEBUG	kafka_scaler	Group autoscaling-subscriber has a lag of 252 for topic metrics and partition 0
2020-09-14T12:41:22.586Z	DEBUG	scalehandler	Scaler for scaledObject is active	{"Metrics Name": "kafka-metrics-autoscaling-subscriber"}
2020-09-14T12:41:22.597Z	INFO	scaleexecutor	Successfully updated ScaleTarget	{"scaledobject.Name": "subscriber-scaler", "scaledObject.Namespace": "default", "scaleTarget.Name": "autoscaling-subscriber", "Original Replicas Count": 0, "New Replicas Count": 1}
2020-09-14T12:41:25.993Z	INFO	controllers.ScaledObject	Reconciling ScaledObject	{"ScaledObject.Namespace": "default", "ScaledObject.Name": "subscriber-scaler"}
2020-09-14T12:41:25.993Z	DEBUG	controllers.ScaledObject	Parsed Group, Version, Kind, Resource	{"ScaledObject.Namespace": "default", "ScaledObject.Name": "subscriber-scaler", "GVK": "apps/v1.Deployment", "Resource": "deployments"}
2020-09-14T12:41:26.017Z	DEBUG	controllers.ScaledObject	ScaledObject is defined correctly and is ready for scaling	{"ScaledObject.Namespace": "default", "ScaledObject.Name": "subscriber-scaler"}
2020-09-14T12:41:26.023Z	DEBUG	controller	Successfully Reconciled	{"reconcilerGroup": "keda.sh", "reconcilerKind": "ScaledObject", "controller": "scaledobject", "name": "subscriber-scaler", "namespace": "default"}

The HPA is getting updated

NAME                         REFERENCE                           TARGETS     MINPODS   MAXPODS   REPLICAS   AGE
keda-hpa-subscriber-scaler   Deployment/autoscaling-subscriber   3/3 (avg)   1         10        1          17m

But the replica count stays at 1 even when I let the lag build up for a while

NAME                                      READY   STATUS    RESTARTS   AGE
autoscaling-subscriber-65c484f9c5-jrlzl   2/2     Running   0          6m

Note, I'm not seeing the k8s client errors like I did before.

Here are the metrics api server logs

I0914 12:08:17.300550       1 main.go:84] keda_metrics_adapter "msg"="KEDA Version: v2"
I0914 12:08:17.300630       1 main.go:85] keda_metrics_adapter "msg"="KEDA Commit: 717aab81817f0d9b6d02b0e6d1499367c751faf2"
I0914 12:08:17.300826       1 main.go:86] keda_metrics_adapter "msg"="Go Version: go1.15.1"
I0914 12:08:17.300844       1 main.go:87] keda_metrics_adapter "msg"="Go OS/Arch: linux/amd64"
I0914 12:08:18.951620       1 request.go:621] Throttling request took 1.048362821s, request: GET:https://10.0.0.1:443/apis/keda.sh/v1alpha1?timeout=32s
I0914 12:08:19.804392       1 provider.go:49] keda_metrics_adapter/provider "msg"="starting"
I0914 12:08:19.804438       1 main.go:115] keda_metrics_adapter "msg"="starting adapter..."
2020-09-14 12:08:19.804873 I | Starting metrics server at :9022
I0914 12:08:20.376400       1 serving.go:307] Generated self-signed cert (apiserver.local.config/certificates/apiserver.crt, apiserver.local.config/certificates/apiserver.key)
I0914 12:08:20.858983       1 configmap_cafile_content.go:202] Starting client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0914 12:08:20.859030       1 shared_informer.go:223] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0914 12:08:20.859032       1 configmap_cafile_content.go:202] Starting client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0914 12:08:20.859048       1 shared_informer.go:223] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0914 12:08:20.859253       1 dynamic_serving_content.go:130] Starting serving-cert::apiserver.local.config/certificates/apiserver.crt::apiserver.local.config/certificates/apiserver.key
I0914 12:08:20.859743       1 secure_serving.go:178] Serving securely on [::]:6443
I0914 12:08:20.859787       1 tlsconfig.go:240] Starting DynamicServingCertificateController
I0914 12:08:20.959293       1 shared_informer.go:230] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0914 12:08:20.959381       1 shared_informer.go:230] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file

zroubalik · 2020-09-14T13:21:11Z

@mchmarny thanks! Seems like the problem is definitely somewhere between HPA and Metrics Server. Would you mind posting Metrics Server Debug logs? https://github.com/kedacore/keda/blob/v2/BUILD.md#metrics-server-logging

UPDATE: we should be able to see this message, to confirm that the request from HPA was correctly delivered to Metrics Server: https://github.com/kedacore/keda/blob/v2/pkg/provider/provider.go#L61

mchmarny · 2020-09-14T13:30:44Z

Yep, that line in there

I0914 13:30:48.230028       1 provider.go:61] keda_metrics_adapter/provider "msg"="Keda provider received request for external metrics"  "metric name"="kafka-metrics-autoscaling-subscriber" "metricSelector"="scaledObjectName=subscriber-scaler" "namespace"="default"

zroubalik · 2020-09-14T14:07:54Z

@mchmarny Could you please try to get the metrics, to see what is the returned value? (we should improve logging in Metrics Server anyway)

kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1/namespaces/default/kafka-metrics-autoscaling-subscriber?labelSelector=scaledObjectName%3Dsubscriber-scaler"

zroubalik · 2020-09-14T14:08:27Z

That's ^ basically the request that HPA is sending.

mchmarny · 2020-09-14T14:18:02Z

{
  "kind": "ExternalMetricValueList",
  "apiVersion": "external.metrics.k8s.io/v1beta1",
  "metadata": {
    "selfLink": "/apis/external.metrics.k8s.io/v1beta1/namespaces/default/kafka-metrics-autoscaling-subscriber"
  },
  "items": [
    {
      "metricName": "kafka-metrics-autoscaling-subscriber",
      "metricLabels": null,
      "timestamp": "2020-09-14T14:17:51Z",
      "value": "3"
    }
  ]
}

mchmarny · 2020-09-14T14:20:16Z

This is with the lag around 400 so 3 replicas is what I would expect.

2020-09-14T14:18:42.541Z	DEBUG	scalehandler	Scaler for scaledObject is active	{"Metrics Name": "kafka-metrics-autoscaling-subscriber"}
2020-09-14T14:18:57.602Z	DEBUG	kafka_scaler	Group autoscaling-subscriber has a lag of 358 for topic metrics and partition 0

zroubalik · 2020-09-14T14:42:03Z

@mchmarny I can see the problem, you have set just 1 Kafka partition. KEDA will not scale out beyond the number of partitions, because it doesn't make a sense. I bet, that if you increase the number of partitions, the scaling will work as expected.

mchmarny · 2020-09-14T14:57:20Z

Oh ye, the one consumer per partition limit within a group... I was swapping things around to make this more reproducible and forgot to change back after last time. Thanks

stale · 2021-10-13T21:08:28Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

stale · 2021-10-20T23:06:44Z

This issue has been automatically closed due to inactivity.

Signed-off-by: Vighnesh Shenoy <[email protected]> Signed-off-by: Vighnesh Shenoy <[email protected]>

3d0it added the bug Something isn't working label Jul 1, 2020

This was referenced Aug 30, 2020

keda scale up doesn't work despite large queue length lag mchmarny/dapr-demos#6

Closed

keda scale up doesn't work despite large queue length lag mchmarny/dapr-demos#7

Closed

tomkerkhove assigned ahmelsayed Sep 14, 2020

stale bot added the stale All issues that are marked as stale due to inactivity label Oct 13, 2021

stale bot closed this as completed Oct 20, 2021

SpiritZhou pushed a commit to SpiritZhou/keda that referenced this issue Jul 18, 2023

Fix pod identity links in Azure docs. (kedacore#906)

211900d

Signed-off-by: Vighnesh Shenoy <[email protected]> Signed-off-by: Vighnesh Shenoy <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scaling up not work right on Virtual Kubelet #906

Scaling up not work right on Virtual Kubelet #906

3d0it commented Jul 1, 2020

tomkerkhove commented Jul 1, 2020

3d0it commented Jul 1, 2020

3d0it commented Jul 3, 2020

3d0it commented Jul 3, 2020

mchmarny commented Aug 30, 2020

tomkerkhove commented Aug 30, 2020

zroubalik commented Aug 31, 2020

mchmarny commented Aug 31, 2020 •

edited

Loading

yaron2 commented Aug 31, 2020

tomkerkhove commented Aug 31, 2020

mchmarny commented Aug 31, 2020 •

edited

Loading

zroubalik commented Sep 1, 2020 •

edited

Loading

mchmarny commented Sep 14, 2020

tomkerkhove commented Sep 14, 2020

zroubalik commented Sep 14, 2020

mchmarny commented Sep 14, 2020

zroubalik commented Sep 14, 2020 •

edited

Loading

mchmarny commented Sep 14, 2020 •

edited

Loading

zroubalik commented Sep 14, 2020 •

edited

Loading

zroubalik commented Sep 14, 2020

mchmarny commented Sep 14, 2020

mchmarny commented Sep 14, 2020

zroubalik commented Sep 14, 2020

mchmarny commented Sep 14, 2020

stale bot commented Oct 13, 2021

stale bot commented Oct 20, 2021

Scaling up not work right on Virtual Kubelet #906

Scaling up not work right on Virtual Kubelet #906

Comments

3d0it commented Jul 1, 2020

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Specifications

tomkerkhove commented Jul 1, 2020

3d0it commented Jul 1, 2020

3d0it commented Jul 3, 2020

3d0it commented Jul 3, 2020

mchmarny commented Aug 30, 2020

tomkerkhove commented Aug 30, 2020

zroubalik commented Aug 31, 2020

mchmarny commented Aug 31, 2020 • edited Loading

yaron2 commented Aug 31, 2020

tomkerkhove commented Aug 31, 2020

mchmarny commented Aug 31, 2020 • edited Loading

zroubalik commented Sep 1, 2020 • edited Loading

mchmarny commented Sep 14, 2020

tomkerkhove commented Sep 14, 2020

zroubalik commented Sep 14, 2020

mchmarny commented Sep 14, 2020

zroubalik commented Sep 14, 2020 • edited Loading

mchmarny commented Sep 14, 2020 • edited Loading

zroubalik commented Sep 14, 2020 • edited Loading

zroubalik commented Sep 14, 2020

mchmarny commented Sep 14, 2020

mchmarny commented Sep 14, 2020

zroubalik commented Sep 14, 2020

mchmarny commented Sep 14, 2020

stale bot commented Oct 13, 2021

stale bot commented Oct 20, 2021

mchmarny commented Aug 31, 2020 •

edited

Loading

mchmarny commented Aug 31, 2020 •

edited

Loading

zroubalik commented Sep 1, 2020 •

edited

Loading

zroubalik commented Sep 14, 2020 •

edited

Loading

mchmarny commented Sep 14, 2020 •

edited

Loading

zroubalik commented Sep 14, 2020 •

edited

Loading