[Datadog scaler] Scaler returns fillValue if the last data point of query is null #3906

dogzzdogzz · 2022-11-23T10:09:01Z

Report

I reported similar issue 3448 before, the root cause was null latest data point to cause the exception. Recently I finally have time to test the query again. This time, I make sure all data points have data and with the same number of data points within the period

Query 1: sum:trace.express.request.hits{service:foo,env:bar}.as_rate() , keda can get the metrics of this query without any problem

# curl result of query 1
{
  "status": "ok",
  "resp_version": 1,
  "series": [
    {
      "end": 1669106969000,
      "attributes": {},
      "metric": "trace.express.request.hits",
      "interval": 10,
      "tag_set": [],
      "start": 1669106910000,
      "length": 6,
      "query_index": 0,
      "aggr": "sum",
      "scope": "env:bar,service:foo",
      "pointlist": [
        [
          1669106910000,
          417.3
        ],
        [
          1669106920000,
          465.4
        ],
        [
          1669106930000,
          447.1
        ],
        [
          1669106940000,
          440.9
        ],
        [
          1669106950000,
          440.8
        ],
        [
          1669106960000,
          748.5000000000001
        ]
      ],
      "expression": "sum:trace.express.request.hits{env:bar,service:foo}.as_rate()",
      "unit": [
        {
          "family": "cache",
          "scale_factor": 1,
          "name": "hit",
          "short_name": null,
          "plural": "hits",
          "id": 39
        },
        {
          "family": "time",
          "scale_factor": 1,
          "name": "second",
          "short_name": "s",
          "plural": "seconds",
          "id": 11
        }
      ],
      "display_name": "trace.express.request.hits"
    }
  ],
  "to_date": 1669106966000,
  "query": "sum:trace.express.request.hits{service:foo,env:bar}.as_rate()",
  "message": "",
  "res_type": "time_series",
  "times": [],
  "from_date": 1669106906000,
  "group_by": [],
  "values": []
}

Query 2: avg:kubernetes.cpu.requests{service:foo,env:bar}.rollup(10) , keda can get the metrics of this query without any problem```

# curl result of query 2
{
  "status": "ok",
  "resp_version": 1,
  "series": [
    {
      "end": 1669106969000,
      "attributes": {},
      "metric": "kubernetes.cpu.requests",
      "interval": 10,
      "tag_set": [],
      "start": 1669106910000,
      "length": 6,
      "query_index": 0,
      "aggr": "avg",
      "scope": "env:bar,service:foo",
      "pointlist": [
        [
          1669106910000,
          0.11999999731779099
        ],
        [
          1669106920000,
          0.11999999731779099
        ],
        [
          1669106930000,
          0.11999999731779099
        ],
        [
          1669106940000,
          0.11999999731779099
        ],
        [
          1669106950000,
          0.11999999731779099
        ],
        [
          1669106960000,
          0.11999999731779099
        ]
      ],
      "expression": "avg:kubernetes.cpu.requests{env:bar,service:foo}.rollup(10)",
      "unit": [
        {
          "family": "cpu",
          "scale_factor": 1,
          "name": "core",
          "short_name": null,
          "plural": "cores",
          "id": 31
        },
        null
      ],
      "display_name": "kubernetes.cpu.requests"
    }
  ],
  "to_date": 1669106966000,
  "query": "avg:kubernetes.cpu.requests{service:foo,env:bar}.rollup(10)",
  "message": "",
  "res_type": "time_series",
  "times": [],
  "from_date": 1669106906000,
  "group_by": [],
  "values": []
}

If I combine above two queries into "query 1 / query 2"
Query 3: sum:trace.express.request.hits{service:foo,env:bar}.as_rate()/avg:kubernetes.cpu.requests{service:foo,env:bar}.rollup(10)
The curl result looks ok but KEDA is failed to get the metrics

# curl result if query 3
{
  "status": "ok",
  "resp_version": 1,
  "series": [
    {
      "end": 1669106969000,
      "attributes": {},
      "metric": "(trace.express.request.hits / kubernetes.cpu.requests)",
      "interval": 10,
      "tag_set": [],
      "start": 1669106910000,
      "length": 6,
      "query_index": 0,
      "aggr": "sum",
      "scope": "env:bar,service:foo",
      "pointlist": [
        [
          1669106910000,
          3477.500077728184
        ],
        [
          1669106920000,
          3878.3334200208405
        ],
        [
          1669106930000,
          3725.8334166122
        ],
        [
          1669106940000,
          3674.166748790693
        ],
        [
          1669106950000,
          3673.3334154387335
        ],
        [
          1669106960000,
          6237.500139418993
        ]
      ],
      "expression": "(sum:trace.express.request.hits{env:bar,service:foo}.as_rate() / avg:kubernetes.cpu.requests{env:bar,service:foo}.rollup(10))",
      "unit": null,
      "display_name": "(trace.express.request.hits / kubernetes.cpu.requests)"
    }
  ],
  "to_date": 1669106966000,
  "query": "sum:trace.express.request.hits{service:foo,env:bar}.as_rate()/avg:kubernetes.cpu.requests{service:foo,env:bar}.rollup(10)",
  "message": "",
  "res_type": "time_series",
  "times": [],
  "from_date": 1669106906000,
  "group_by": [],
  "values": []
}

HPA events

  Warning  FailedGetExternalMetric       31m (x8 over 23h)  horizontal-pod-autoscaler  unable to get external metric development/s0-datadog-sum-trace-express-request-hits/&LabelSelector{MatchLabels:map[string]string{scaledobject.keda.sh/name: foo,},MatchExpressions:[]LabelSelectorRequirement{},}: unable to fetch metrics from external metrics API: no matching metrics found for s0-datadog-sum-trace-express-request-hits
  Warning  FailedComputeMetricsReplicas  31m (x8 over 23h)  horizontal-pod-autoscaler  invalid metrics (1 invalid out of 1), first error is: failed to get s0-datadog-sum-trace-express-request-hits external metric: unable to get external metric development/s0-datadog-sum-trace-express-request-hits/&LabelSelector{MatchLabels:map[string]string{scaledobject.keda.sh/name: foo,},MatchExpressions:[]LabelSelectorRequirement{},}: unable to fetch metrics from external metrics API: no matching metrics found for s0-datadog-sum-trace-express-request-hits

If I remove as_rate() from Query 3
Query 4: sum:trace.express.request.hits{service:foo,env:bar}/avg:kubernetes.cpu.requests{service:foo,env:bar}.rollup(10)
KEDA can get the metrics without any problem

# curl result of query 4

  "status": "ok",
  "resp_version": 1,
  "series": [
    {
      "end": 1669106960000,
      "attributes": {},
      "metric": "(trace.express.request.hits / kubernetes.cpu.requests)",
      "interval": 1,
      "tag_set": [],
      "start": 1669106910000,
      "length": 6,
      "query_index": 0,
      "aggr": "sum",
      "scope": "service:foo,env:bar",
      "pointlist": [
        [
          1669106910000,
          64426.34634984409
        ],
        [
          1669106920000,
          66898.41915361531
        ],
        [
          1669106930000,
          65685.75394920415
        ],
        [
          1669106940000,
          66132.75977946246
        ],
        [
          1669106950000,
          66287.59952733
        ],
        [
          1669106960000,
          66472.42117646785
        ]
      ],
      "expression": "(sum:trace.express.request.hits{service:foo,env:bar} / avg:kubernetes.cpu.requests{service:foo,env:bar}.rollup(10))",
      "unit": null,
      "display_name": "(trace.express.request.hits / kubernetes.cpu.requests)"
    }
  ],
  "to_date": 1669106966000,
  "query": "sum:trace.express.request.hits{service:foo,env:bar}/avg:kubernetes.cpu.requests{service:foo,env:bar}.rollup(10)",
  "message": "",
  "res_type": "time_series",
  "times": [],
  "from_date": 1669106906000,
  "group_by": [],
  "values": []
}

Expected Behavior

KEDA can get the metrics without any problem as long as it is working with curl for same query

Actual Behavior

Explained above

Steps to Reproduce the Problem

Explained above

Logs from KEDA operator

E1123 09:05:17.876484       1 datadog_scaler.go:318] keda_metrics_adapter/datadog_scaler "msg"="error getting metrics from Datadog" "error"="error when retrieving Datadog metrics: Get \"https://api.datadoghq.com/api/v1/query?from=1669194254&query=sum%3Atrace.express.request.hits%7Bservice%3Afoo%2Cenv%3Abar%7D.as_rate%28%29%2Favg%3Akubernetes.cpu.requests%7Bservice%3Afoo%2Cenv%3Abar%7D.rollup%2810%29&to=1669194314\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)" "name"="foo" "namespace"="bar" "type"="ScaledObject"

KEDA Version

2.8.1

Kubernetes Version

< 1.23

Platform

Amazon Web Services

Scaler Details

Datadog

Anything else?

No response

The text was updated successfully, but these errors were encountered:

JorTurFer · 2022-11-24T00:17:31Z

Hi @dogzzdogzz ,
I have tried your JSONs injecting them into the datadog SDK and I can't reproduce the issue, queries 3 and 4 work without any problem. Are you sure that there isn't any strange situation like +Inf or -Inf, or IDK, but using those values I can't reproduce the issue. Is there any new logs related with this?

JorTurFer · 2022-11-24T00:18:38Z

BTW, I closed the issue by error, and I have reopened it

dogzzdogzz · 2022-11-24T06:21:06Z

@JorTurFer Hmm...It's strange that you can not reproduce, It's happening on all of my clusters, to clarify the reproducible issue, just wanna double check if you do have metric trace.express.request.hits on your datadog and it does has some data ingested from the datadog agent ?

And could you kindly help to check if anything wrong in my scaledobject manifests to cause this issue ?

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  labels:
    scaledobject.keda.sh/name: query-1
  annotations:
    meta.helm.sh/release-name: foo
    meta.helm.sh/release-namespace: bar
  labels:
    app: foo
    app.kubernetes.io/managed-by: Helm
    helm.toolkit.fluxcd.io/name: foo
    helm.toolkit.fluxcd.io/namespace: bar
    scaledobject.keda.sh/name: foo
  name: query-1
  namespace: bar
spec:
  maxReplicaCount: 1
  minReplicaCount: 1
  scaleTargetRef:
    name: foo
  triggers:
  - authenticationRef:
      kind: ClusterTriggerAuthentication
      name: keda-trigger-auth-datadog-secret
    metadata:
      age: "60"
      metricUnavailableValue: "0"
      query: 'sum:trace.express.request.hits{service:foo,env:bar}.as_rate()'
      queryValue: "250"
    metricType: AverageValue
    type: datadog
---
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  labels:
    scaledobject.keda.sh/name: query-2
  name: query-2
  namespace: bar
spec:
  maxReplicaCount: 1
  minReplicaCount: 1
  scaleTargetRef:
    name: foo
  triggers:
  - authenticationRef:
      kind: ClusterTriggerAuthentication
      name: keda-trigger-auth-datadog-secret
    metadata:
      age: "60"
      metricUnavailableValue: "0"
      query: 'avg:kubernetes.cpu.requests{service:foo,env:bar}.rollup(10)'
      queryValue: "250"
    metricType: AverageValue
    type: datadog
---
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  labels:
    scaledobject.keda.sh/name: query-3
  name: query-3
  namespace: bar
spec:
  maxReplicaCount: 1
  minReplicaCount: 1
  scaleTargetRef:
    name: foo
  triggers:
  - authenticationRef:
      kind: ClusterTriggerAuthentication
      name: keda-trigger-auth-datadog-secret
    metadata:
      age: "60"
      metricUnavailableValue: "0"
      query: 'sum:trace.express.request.hits{service:foo,env:bar}.as_rate()/avg:kubernetes.cpu.requests{service:foo,env:bar}.rollup(10)'
      queryValue: "250"
    metricType: AverageValue
    type: datadog
---
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  labels:
    scaledobject.keda.sh/name: query-4
  name: query-4
  namespace: bar
spec:
  maxReplicaCount: 1
  minReplicaCount: 1
  scaleTargetRef:
    name: foo
  triggers:
  - authenticationRef:
      kind: ClusterTriggerAuthentication
      name: keda-trigger-auth-datadog-secret
    metadata:
      age: "60"
      metricUnavailableValue: "0"
      query: 'sum:trace.express.request.hits{service:foo,env:bar}/avg:kubernetes.cpu.requests{service:foo,env:bar}.rollup(10)'
      queryValue: "250"
    metricType: AverageValue
    type: datadog

Below is the HPA status of above manifests, you can see that query-1/2/4 can get the data without problem, but the metric of query-3 is always 0 because of metricUnavailableValue config

# kubectl get hpa keda-hpa-query-1 keda-hpa-query-2 keda-hpa-query-3 keda-hpa-query-4
NAME               REFERENCE                TARGETS               MINPODS   MAXPODS   REPLICAS   AGE
keda-hpa-query-1   Deployment/foo   193500m/250 (avg)     1         1         1          7m3s
keda-hpa-query-2   Deployment/foo   119m/250 (avg)        1         1         1          7m2s
keda-hpa-query-3   Deployment/foo   0/250 (avg)           1         1         1          7m2s
keda-hpa-query-4   Deployment/foo   62975001m/250 (avg)   1         1         1          7m1s

JorTurFer · 2022-11-24T07:35:00Z

No no, I tried just modifying the client to return the json you sent. I haven't tried with datadog directly because I don't have anything working with datadog. If you could share all the manifest to spin up an scenario to reproduce your issue, I can use our datadog account to try it. (Sorry, I have 0 expertise with datadog and IDK how to generate the same scenario)

I can install datadog agent on my cluster, and then what do I need to deploy for generating those metrics?

JorTurFer · 2022-11-24T07:37:10Z

Below is the HPA status of above manifests, you can see that query-1/2/4 can get the data without problem, but the metric of query-3 is always 0 because of metricUnavailableValue config

I remember that the original issue was a panic recovering metrics, not just a fallback to 0. Is this time the same issue? I mean, a fallback to 0 could mean for example that the time window is too small to recover metrics. It's not the same that behavior that a panic in the scaler

dogzzdogzz · 2022-11-24T08:32:19Z

No no, I tried just modifying the client to return the json you sent. I haven't tried with datadog directly because I don't have anything working with datadog. If you could share all the manifest to spin up an scenario to reproduce your issue, I can use our datadog account to try it. (Sorry, I have 0 expertise with datadog and IDK how to generate the same scenario)

I can install datadog agent on my cluster, and then what do I need to deploy for generating those metrics?

Oh ok, trace.express.request.hits requires to install a dd-trace APM library to some service because it's http request count metrics, I think it might be too trouble to you. Let me check if I can reproduce with any other more common and existing metrics on datadog

I remember that the original issue was a panic recovering metrics, not just a fallback to 0. Is this time the same issue? I mean, a fallback to 0 could mean for example that the time window is too small to recover metrics. It's not the same that behavior that a panic in the scaler

I think it's not the same as issue 3448 because I already used rollup(10) to avoid the null data point

dogzzdogzz · 2022-12-05T06:32:16Z

@JorTurFer I updated my curl script and make sure the time of TO is the current time, finally found out that the last one or last two data points in response are always null, I'll create a PR to fix this issue.

TO=$(($(date +%s))) && \
FROM=$(($END - 60)) && \
curl -X GET "https://api.datadoghq.com/api/v1/query?from=$FROM&to=$TO&query=sum:trace.express.request.hits\{service:foo,env:bar\}.as_rate()/avg:kubernetes.cpu.requests\{service:foo,env:bar\}.rollup(10)"

Response

{
  "status": "ok",
  "resp_version": 1,
  "series": [
    {
      "end": 1670221209000,
      "attributes": {},
      "metric": "(trace.express.request.hits / kubernetes.cpu.requests)",
      "interval": 10,
      "tag_set": [],
      "start": 1670221160000,
      "length": 5,
      "query_index": 0,
      "aggr": "sum",
      "scope": "env:bar,service:foo",
      "pointlist": [
        [
          1670221160000,
          2276.6667175541334
        ],
        [
          1670221170000,
          2467.5000551529242
        ],
        [
          1670221180000,
          2144.1667145925276
        ],
        [
          1670221190000,
          null
        ],
        [
          1670221200000,
          null
        ]
      ],
      "expression": "(sum:trace.express.request.hits{env:bar,service:foo}.as_rate() / avg:kubernetes.cpu.requests{env:bar,service:foo}.rollup(10))",
      "unit": null,
      "display_name": "(trace.express.request.hits / kubernetes.cpu.requests)"
    }
  ],
  "to_date": 1670221212000,
  "query": "sum:trace.express.request.hits{service:foo,env:bar}.as_rate()/avg:kubernetes.cpu.requests{service:foo,env:bar}.rollup(10)",
  "message": "",
  "res_type": "time_series",
  "times": [],
  "from_date": 1670221152000,
  "group_by": [],
  "values": []
}

dogzzdogzz · 2022-12-08T08:23:51Z

For anyone encounters this issue as well. there are some additional issues and details with Datadog API response mentioned in this comment
#3954 (comment)

dogzzdogzz added the bug Something isn't working label Nov 23, 2022

dogzzdogzz changed the title ~~[Datadog scaler]. Failed to get the metrics for some specific datadog query~~ [Datadog scaler] Failed to get the metrics for some specific datadog query Nov 23, 2022

tomkerkhove added this to Roadmap - KEDA Core Nov 23, 2022

tomkerkhove moved this to Proposed in Roadmap - KEDA Core Nov 23, 2022

JorTurFer closed this as completed Nov 24, 2022

Repository owner moved this from Proposed to Ready To Ship in Roadmap - KEDA Core Nov 24, 2022

JorTurFer reopened this Nov 24, 2022

Repository owner moved this from Ready To Ship to Proposed in Roadmap - KEDA Core Nov 24, 2022

dogzzdogzz mentioned this issue Dec 5, 2022

fix/datadog-scaler-null-last-point #3954

Merged

2 tasks

dogzzdogzz changed the title ~~[Datadog scaler] Failed to get the metrics for some specific datadog query~~ [Datadog scaler] Scaler returns fillValue if the last data point of query is null Dec 5, 2022

dogzzdogzz mentioned this issue Dec 8, 2022

Support timeWindowOffset and lastAvailablePointOffset in datadog scaler kedacore/keda-docs#1002

Merged

1 task

zroubalik closed this as completed in #3954 Dec 8, 2022

Repository owner moved this from Proposed to Ready To Ship in Roadmap - KEDA Core Dec 8, 2022

JorTurFer moved this from Ready To Ship to Done in Roadmap - KEDA Core Mar 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Datadog scaler] Scaler returns fillValue if the last data point of query is null #3906

[Datadog scaler] Scaler returns fillValue if the last data point of query is null #3906

dogzzdogzz commented Nov 23, 2022

JorTurFer commented Nov 24, 2022 •

edited

Loading

JorTurFer commented Nov 24, 2022

dogzzdogzz commented Nov 24, 2022

JorTurFer commented Nov 24, 2022

JorTurFer commented Nov 24, 2022

dogzzdogzz commented Nov 24, 2022

dogzzdogzz commented Dec 5, 2022

dogzzdogzz commented Dec 8, 2022 •

edited

Loading

[Datadog scaler] Scaler returns fillValue if the last data point of query is null #3906

[Datadog scaler] Scaler returns fillValue if the last data point of query is null #3906

Comments

dogzzdogzz commented Nov 23, 2022

Report

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Logs from KEDA operator

KEDA Version

Kubernetes Version

Platform

Scaler Details

Anything else?

JorTurFer commented Nov 24, 2022 • edited Loading

JorTurFer commented Nov 24, 2022

dogzzdogzz commented Nov 24, 2022

JorTurFer commented Nov 24, 2022

JorTurFer commented Nov 24, 2022

dogzzdogzz commented Nov 24, 2022

dogzzdogzz commented Dec 5, 2022

dogzzdogzz commented Dec 8, 2022 • edited Loading

JorTurFer commented Nov 24, 2022 •

edited

Loading

dogzzdogzz commented Dec 8, 2022 •

edited

Loading