Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add helm check feature #1060

Merged
merged 11 commits into from
Feb 23, 2024
Merged

Add helm check feature #1060

merged 11 commits into from
Feb 23, 2024

Conversation

fanny-jiang
Copy link
Contributor

@fanny-jiang fanny-jiang commented Jan 30, 2024

What does this PR do?

Add support for the helm check in the operator DatadogAgent v2alpha1 CRD

Motivation

Customer request, improve ease of configurability of the helm check

Additional Notes

Anything else we should know when reviewing?

Minimum Agent Versions

Are there minimum versions of the Datadog Agent and/or Cluster Agent required?

features.helmCheck.enabled:

  • Agent: v7.35.0
  • Cluster Agent: v1.20.0

features.helmCheck.collectEvents:

  • Agent: v7.36.0
  • Cluster Agent: v1.20.0

features.helmCheck.valuesAsTags:

  • Agent: v7.40.0
  • Cluster Agent: v1.40.0

Describe your test plan

  • Validate defaults
  • Validate defaults with clusterChecksRunners
  • Validate features.helmCheck.collectEvents: true
  • Validate features.helmCheck.valuesAsTags

For each test scenario:

  • helm check configmap is created with applicable configs
  • when CCR is enabled, configmap has cluster_check: true, otherwise cluster_check: false
  • helm check clusterRole policy rules are created and do not change
  • helm check clusterRoleBinding roleRefName corresponds to the expected pod running the check (dca vs ccr)

Validate Defaults

  1. In a kind cluster, deploy dda v2alpha1 with features.helmCheck.enabled: true:
apiVersion: datadoghq.com/v2alpha1
kind: DatadogAgent
metadata:
  name: datadog
  namespace: system
spec:
  global:
    clusterName: kind-kind
    kubelet:
      tlsVerify: false
    credentials:
      apiSecret:
        secretName: datadog-secret
        keyName: api-key
      appSecret:
        secretName: datadog-secret
        keyName: app-key
  features:
    helmCheck:
      enabled: true
  1. Install any helm chart e.g. helm install my-release oci://registry-1.docker.io/bitnamicharts/mysql

  2. Verify that the cluster agent is deployed and runs the helm check:

kubectl exec -it <cluster_agent_pod> -- agent status

# Helm check should be collecting metrics and there should be no errors in the agent logs. 
kubectl exec -it  <cluster_agent_pod> -- tail -f /var/log/datadog/cluster-agent.log
  1. Verify that the helm configMap is present and has the expected default contents: kubectl get configmap datadog-helm-check-config -oyaml
apiVersion: v1
data:
  helm.yaml: |
    ---
    cluster_check: false
    init_config:
    instances:
      - collect_events: false
kind: ConfigMap
metadata:
  name: datadog-helm-check-config
  namespace: system
...
  1. Verify the helm check clusterrole
  • Make sure it is present and contents are correct

kubectl get clusterrole system-datadog-helm-check-dca -oyaml

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: system-datadog-helm-check-dca
...
rules:
- apiGroups:
  - ""
  resources:
  - secrets
  - configmaps
  verbs:
  - get
  - list
  - watch
  1. Verify the helm check clusterrolebinding
  • Verify that it is present and contents are correct
  • Make sure the roleRef.Name is correct (system-datadog-helm-check-dca)

kubectl get clusterrolebinding system-datadog-helm-check-dca -oyaml

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: system-datadog-helm-check-dca
...
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system-datadog-helm-check-dca
subjects:
- kind: ServiceAccount
  name: datadog-cluster-agent
  namespace: system
  1. Verify that the helm.release metric is collected in the Datadog metrics explorer

Validate defaults with clusterChecksRunners

  1. Deploy dda v2alpha1 with CCR enabled:
  features:
    helmCheck:
      enabled: true
    clusterChecks:
      enabled: true
      useClusterChecksRunners: true
  1. Verify steps 2-7 from above "Validate Defaults". Everything should remain unchanged except:
  • step 4) helm check configMap should be present and data contents should be:
  helm.yaml: |
    ---
    cluster_check: true
    init_config:
    instances:
      - collect_events: false

See cluster_check: true

  • step 5) helm check clusterRole should be named system-datadog-helm-check-ccr
  • step 6) helm check clusterRoleBinding roleRef.Name should be system-datadog-helm-check-ccr

Validate features.helmCheck.collectEvents: true

  1. In a kind cluster, deploy dda v2alpha1 with the following features configured:
  features:
    helmCheck:
      enabled: true
      collectEvents: true
  1. Cluster-agent pod should redeploy. Uninstall and reinstall helm chart to trigger new events.

Verify steps 3-6 from above the above Defaults. Everything should remain unchanged expect for the configmap:

  1. helm check configMap should be present and data contents should be:
  helm.yaml: |
    ---
    cluster_check: false
    init_config:
    instances:
      - collect_events: true
  1. Verify that helm metrics are still collected in the metrics explorer and helm events are collected in the Datadog Events Explorer

  2. Repeat with the CCR enabled in the dda. Everything should remain the same as clusterChecksRunners except the configmap should have cluster_check: true and the RBAC names are system-datadog-helm-check-ccr.

Validate features.helmCheck.valuesAsTags

  1. In a kind cluster, deploy dda v2alpha1 with the following features configured:
  features:
    helmCheck:
      enabled: true
      collectEvents: true
      valuesAsTags:
        image.tag: image_tag
        image.repository: image_repo

Validate steps 2-6 from above "Validate features.helmCheck.collectEvents" steps. Everything should remain the same except for the configmap contents.

  1. helm check configMap should be present and data contents should be:
  helm.yaml: |
    ---
    cluster_check: false
    init_config:
    instances:
      - collect_events: true
         valuesAsTags:
           image.tag: image_tag
           image.repository: image_repo
  1. Verify that helm metrics are collected with the new tags in the metrics explorer. Verify that events are collected and have the new tags in the Datadog Events Explorer

  2. Repeat with CCR enabled in the dda. Everything should remain the same except the configmap should have cluster_check: true and the RBAC names are system-datadog-helm-check-ccr.

Checklist

  • PR has at least one valid label: bug, enhancement, refactoring, documentation, tooling, and/or dependencies
  • PR has a milestone or the qa/skip-qa label

@fanny-jiang fanny-jiang added the enhancement New feature or request label Jan 30, 2024
@fanny-jiang fanny-jiang added this to the v1.5.0 milestone Jan 30, 2024
@fanny-jiang fanny-jiang requested review from a team as code owners January 30, 2024 19:10
@codecov-commenter
Copy link

codecov-commenter commented Jan 30, 2024

Codecov Report

Attention: 21 lines in your changes are missing coverage. Please review.

Comparison is base (ef43dbb) 58.56% compared to head (9251767) 58.68%.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1060      +/-   ##
==========================================
+ Coverage   58.56%   58.68%   +0.12%     
==========================================
  Files         166      169       +3     
  Lines       20533    20644     +111     
==========================================
+ Hits        12025    12115      +90     
- Misses       7782     7799      +17     
- Partials      726      730       +4     
Flag Coverage Δ
unittests 58.68% <81.08%> (+0.12%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
apis/datadoghq/v2alpha1/datadogagent_types.go 100.00% <ø> (ø)
controllers/datadogagent/controller.go 58.64% <ø> (ø)
controllers/datadogagent/dependencies/store.go 64.47% <ø> (ø)
...ollers/datadogagent/feature/helmcheck/configmap.go 100.00% <100.00%> (ø)
...ontrollers/datadogagent/feature/helmcheck/const.go 100.00% <100.00%> (ø)
apis/datadoghq/v2alpha1/datadogagent_default.go 90.40% <62.50%> (-0.93%) ⬇️
...trollers/datadogagent/feature/helmcheck/feature.go 74.64% <74.64%> (ø)

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ef43dbb...9251767. Read the comment docs.

Copy link
Contributor

@maycmlee maycmlee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a couple of suggestions

docs/configuration.v2alpha1.md Outdated Show resolved Hide resolved
docs/configuration.v2alpha1.md Outdated Show resolved Hide resolved
docs/configuration.v2alpha1.md Outdated Show resolved Hide resolved
@levan-m
Copy link
Contributor

levan-m commented Feb 14, 2024

LGTM, needs just one small fix.

@fanny-jiang fanny-jiang merged commit bb82660 into main Feb 23, 2024
19 checks passed
@fanny-jiang fanny-jiang deleted the fanny/CECO-410/helmcheck branch February 23, 2024 16:24
mftoure pushed a commit that referenced this pull request Oct 3, 2024
* Add helm check feature

* fix tests and imports

* fix comment typo

* Apply docs review suggestions

* apply review suggestions/remove customConf

* newline

* add more tests

* apply review suggestions

* fix flaky configmap tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants