From 3f4827bb2c5e8ca11e35a37e741c76912230ebc5 Mon Sep 17 00:00:00 2001 From: dgibbard-cisco <57677847+dgibbard-cisco@users.noreply.github.com> Date: Mon, 12 Sep 2022 20:29:00 +0100 Subject: [PATCH] Documentation for Datadog Multi-Query Support (#885) --- content/docs/2.9/scalers/datadog.md | 80 +++++++++++++++++++++++++++++ 1 file changed, 80 insertions(+) diff --git a/content/docs/2.9/scalers/datadog.md b/content/docs/2.9/scalers/datadog.md index d63a40495..55939afeb 100644 --- a/content/docs/2.9/scalers/datadog.md +++ b/content/docs/2.9/scalers/datadog.md @@ -24,6 +24,7 @@ triggers: query: "sum:trace.redis.command.hits{env:none,service:redis}.as_count()" queryValue: "7.75" activationQueryValue: "1.1" + queryAggregator: "max" type: "global" # Deprecated in favor of trigger.metricType age: "120" metricUnavailableValue: "1.5" @@ -34,6 +35,7 @@ triggers: - `query` - The Datadog query to run. - `queryValue` - Value to reach to start scaling (This value can be a float). - `activationQueryValue` - Target value for activating the scaler. Learn more about activation [here](./../concepts/scaling-deployments.md#activating-and-scaling-thresholds).(Default: `0`, Optional, This value can be a float) +- `queryAggregator` - When `query` is multiple queries, comma-seperated, this sets how to aggregate the multiple results. (Values: `max`, `average`, Required only when `query` contains multiple queries) - `type` - Whether to start scaling based on the value or the average between pods. (Values: `average`, `global`, Default:`average`, Optional) - `age`: The time window (in seconds) to retrieve metrics from Datadog. (Default: `90`, Optional) - `metricUnavailableValue`: The value of the metric to return to the HPA if Datadog doesn't find a metric value for the specified time window. If not set, an error will be returned to the HPA, which will log a warning. (Optional, This value can be a float) @@ -137,3 +139,81 @@ which by default is 15 seconds. For example, if the `kube-controller-manager` was started with `--horizontal-pod-autoscaler-sync-period=30`, the HPA will poll Datadog for a metric value every 30 seconds while the number of replicas is between 1 and N. + +## Multi-Query Support + +To reduce issues with API rate limiting from Datadog, it is possible to send a single query, which contains multiple queries, comma-seperated. +When doing this, the results from each query are aggregated based on the `queryAggregator` value (eg: `max` or `average`). + +> 💡 **NOTE:** Because the average/max aggregation operation happens at the scaler level, there won't be any validation or errors if the queries don't make sense to aggregate. Be sure to read and understand the two patterns below before using Multi-Query. + +### Example 1 - Aggregating Similar Metrics + +Simple aggregation works well, when wanting to scale on more than one metric with similar return values/scale (ie. where multiple metrics can use a single `queryValue` and still make sense). + +```yaml +apiVersion: keda.sh/v1alpha1 +kind: ScaledObject +metadata: + name: datadog-scaledobject + namespace: my-project +spec: + scaleTargetRef: + name: worker + triggers: + - type: datadog + metricType: "AverageValue" + metadata: + # Comma-seperated querys count as a single API call: + query: "per_second(sum:http.requests{service:myservice1}).rollup(max, 300)),per_second(sum:http.requests{service:myservice1}).rollup(avg, 600)" + # According to aggregated results, how to scale the TargetRef + queryValue: "100" + # How to aggregate results from multi-query queries. Default: 'max' + queryAggregator: "average" + # Optional: The time window (in seconds) to retrieve metrics from Datadog. Default: 90 + age: "600" + # Optional: The metric value to return to the HPA if a metric value wasn't found for the specified time window + metricUnavailableValue: "0" + authenticationRef: + name: keda-trigger-auth-datadog-secret +``` + +The example above looks at the `http.requests` value for a service; taking two views of the same metric (max vs avg, and different time windows), and then uses a scale value which is the average of them both. + +This works particularly well when scaling against the same metric, but with slightly different parameters, or methods like ```week_before()``` for example. + +### Example 2 - Driving scale directly + +When wanting to scale on non-similar metrics, whilst still benefiting from reduced API calls with multi-query support, the easiest way to do this is to make each query directly return the desired scale (eg: number of pods), and then `max` or `average` the results to get the desired target scale. + +This can be done by adding arthmetic to the queries, which makes them directly return the number of pods that should be running. + +Following this pattern, and then setting `queryValue: 1` and `metricType: AverageValue` results in the desired number of pods being spawned directly from the results of the metric queries. + +```yaml +apiVersion: keda.sh/v1alpha1 +kind: ScaledObject +metadata: + name: datadog-scaledobject + namespace: my-project +spec: + scaleTargetRef: + name: worker + triggers: + - type: datadog + # `AverageValue` tracks the query results divided by the number of running containers + metricType: "AverageValue" + metadata: + # Comma-seperated queries count as a single API call: + ## This example returns "http.requests" @ 180 requests-per-second per-pod, + ## and "http.backlog" size of 30 per-pod + query: "per_second(sum:http.requests{service:myservice1}).rollup(max, 300))/180,per_second(sum:http.backlog{service:myservice1}).rollup(max, 300)/30" + # Setting query value to 1 and metricType to "AverageValue" allows the metric to dictate the number of pods from it's own arthimetic. + queryValue: "1" + # How to aggregate results from multi-query queries. Default: 'max' + queryAggregator: "max" + authenticationRef: + name: keda-trigger-auth-datadog-secret +``` + +Using the example above, if we assume that `http.requests` is currently returning `360`, dividing that by `180` in the query, results in a value of `2`; if `http.backlog` returns `90`, dividing that by `30` in the query, results in a value of `3`. With the `max` Aggregator set, the scaler will set the target scale to `3` as that is the higher value from all returned queries. \ No newline at end of file