Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Daemonset stuck in progressing #16951

Open
JBodkin-Amphora opened this issue Jan 22, 2024 · 2 comments
Open

Daemonset stuck in progressing #16951

JBodkin-Amphora opened this issue Jan 22, 2024 · 2 comments
Labels
bug Something isn't working component:application-controller component:health-check version:EOL Latest confirmed affected version has reached EOL

Comments

@JBodkin-Amphora
Copy link

Describe the bug

The daemonset is stuck in the progressing phase according to ArgoCD but the daemonset is running on each node (2) in the spot node pool.

Clicking on the daemonset, shows the following message for the health details as Waiting for daemon set "opentelemetry-collector-agent" rollout to finish: 0 of 3 updated pods are available...

The status field on the live manifest is:

status:
  currentNumberScheduled: 2
  desiredNumberScheduled: 2
  numberAvailable: 2
  numberMisscheduled: 0
  numberReady: 2
  observedGeneration: 3
  updatedNumberScheduled: 2

The output of kubectl -n opentelemetry rollout status daemonset/opentelemetry-collector-agent is daemon set "opentelemetry-collector-agent" successfully rolled out

To Reproduce

Deploy the OpenTelemetry Collector as an application with two node pools on Azure:

  1. System Node Pool
  2. Spot Node Pool
---
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  namespace: argocd
  name: opentelemetry-collector
spec:
  project: default
  source:
    chart: opentelemetry-collector
    repoURL: https://open-telemetry.github.io/opentelemetry-helm-charts
    targetRevision: 0.78.1
    helm:
      valuesObject:
        mode: daemonset
        tolerations:
          - key: kubernetes.azure.com/scalesetpriority
            operator: Equal
            value: spot
            effect: NoSchedule
  destination:
    server: https://kubernetes.default.svc
    namespace: opentelemetry
  syncPolicy:
    automated:
      prune: true
    syncOptions:
      - CreateNamespace=true

Expected behavior

The daemonset should be marked as healthy because it is running two pods, one each on each of the spot nodes. The tolerations do not allow the daemonset to run on the system node pool, as the pod does not have the critical addons toleration. As daemonsets are out-of-the-box in Kubernetes, I would expect this to work without having to implement a custom health check.

Screenshots

image

Version

argocd: v2.9.3+6eba5be
  BuildDate: 2023-12-01T23:24:09Z
  GitCommit: 6eba5be864b7e031871ed7698f5233336dfe75c7
  GitTreeState: clean
  GoVersion: go1.21.4
  Compiler: gc
  Platform: windows/amd64
argocd-server: v2.9.4+bb06722
@JBodkin-Amphora JBodkin-Amphora added the bug Something isn't working label Jan 22, 2024
@Samir-NT
Copy link
Contributor

Samir-NT commented Feb 20, 2024

Same issue here (v2.10.1+a79e0ea), but this looks like an (earlier) duplicate of #17208.

@andrii-korotkov-verkada
Copy link
Contributor

ArgoCD versions 2.10 and below have reached EOL. Can you upgrade and let us know if the issue is still present, please?

@andrii-korotkov-verkada andrii-korotkov-verkada added the version:EOL Latest confirmed affected version has reached EOL label Nov 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working component:application-controller component:health-check version:EOL Latest confirmed affected version has reached EOL
Projects
None yet
Development

No branches or pull requests

4 participants