Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate SLO created for each DatadogSLO #1062

Open
jeff-jsq opened this issue Jan 31, 2024 · 3 comments
Open

Duplicate SLO created for each DatadogSLO #1062

jeff-jsq opened this issue Jan 31, 2024 · 3 comments
Labels
bug Something isn't working waiting-user-feedback

Comments

@jeff-jsq
Copy link

Describe what happened:
I've confirmed I'm only running a single Datadog operator in my K8s cluster, but it seems each DatadogSLO creates multiple SLOs in Datadog.

Running 1.3.0 of the operator, creating an example DatadogSLO:

apiVersion: datadoghq.com/v1alpha1
kind: DatadogSLO
metadata:
  name: text-xyz
  namespace: test
spec:
  description: Error SLO for test-xyz
  name: Error SLO for test-xyz
  query:
    denominator: sum:trace.pyramid.request.hits{service:test-xyz, env:test}.as_count()
    numerator: sum:trace.pyramid.request.hits{service:test-xyz, env:test}.as_count()
      - sum:trace.pyramid.request.errors{service:test-xyz, env:test}.as_count()
  tags:
  - integration:kubernetes
  - service:test-xyz
  - env:test
  - team:sre
  - generated:kubernetes
  targetThreshold: 99500m
  timeframe: 7d
  type: metric

results in multiple SLOs being created in Datadog:

CleanShot 2024-01-31 at 11 35 23@2x

Deleting the DatadogSLO results in one of the SLOs being orphaned in Datadog.

Describe what you expected:

I expect a single DatadogSLO resource to result in a single SLO created in Datadog.

Steps to reproduce the issue:

Install the Datadog Operator via Helm (chart version 1.4.1) with following values:

datadogCRDs:
  crds:
    datadogSLOs: true
apiKeyExistingSecret: datadog-secret
appKeyExistingSecret: datadog-secret
datadogMonitor:
  enabled: true
datadogSLO:
  enabled: true
site: datadoghq.com
watchNamespaces:
- ""

Kubectl apply the example DatadogSLO above.

Additional environment details (Operating System, Cloud provider, etc):

@khewonc
Copy link
Contributor

khewonc commented Feb 1, 2024

Hi, thanks for reporting this. We'll look into this on our end to try and see why multiple SLOs are getting created

@paulbrassard-figure
Copy link

I've also seen this issue using the 1.8.3 helm chart with the 1.7.0 operator.

Additionally, I was using Kyverno with a generate policy for DatadogSLOs and synchronization turned on. My target threshold was set to "99.0" and the datadog-operator controller would change it to "99", which caused Kyverno and the datadog-operator to fight back and forth changing it. The result was that I had around 40 duplicate SLOs as described in this issue. I only add all this to say that it seems that this problem gets exacerbated by updating the resource.

@levan-m
Copy link
Contributor

levan-m commented Jul 25, 2024

Thanks for the reporting the issue @paulbrassard-figure!

As mentioned here the fix addressed once specific case leading to duplication - namely concurrent reconciliation of the resource. With SLO Create API not being idempotent we can't guarantee that duplication won't happen. So it would be great if you could share more details about your setup, how to reproduce the issue with Kyverno and if possible without.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working waiting-user-feedback
Projects
None yet
Development

No branches or pull requests

4 participants