Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

App-of-Apps Related Bugs #5126

Closed
rajivml opened this issue Dec 27, 2020 · 6 comments
Closed

App-of-Apps Related Bugs #5126

rajivml opened this issue Dec 27, 2020 · 6 comments
Labels
bug Something isn't working

Comments

@rajivml
Copy link

rajivml commented Dec 27, 2020

ArgoCD Version: 1.8.1

Scenario 1:

If we see the below screenshot, there are multiple apps which got stuck in Unknown state, all these apps require a configMap and it's not yet created , but these deployments has to fail after sometime right, they are just stuck in this state forever, is this expected ?

image

Scenario 2:
Am deleting the parent App because all these child apps got stuck in UnKnown state, but the deletion just got stuck , it's keep on saying "Deleting" from last 15 minutes but neither the child apps got deleted nor the parent

Scenario 3:
Since scenario 2 isn't working, I tried deleting the child apps, and they are getting deleted fine individually, but if I delete the child apps, the parent App is getting stuck forever in deleting state as it's polling for the health of it's dependents who no longer exists

To overcome #3, I have to remove the finalizer on the app

Scenario 4:
am using app-of-apps and for me the order in which apps are deployed is important app1->app2->app3 , I have configured the order using sync-wave, but before triggering the deployment of App1, argo isn't checking whether app1 deployment is successful or not, it's just triggering the deployment of all other apps in the sequence specified.. not sure if this expected.

Scenario 5:
I am creating this configMap basis the input received from user via argo UI and am adding this label ( secret-copier: yes) while creating the configMap, but some how argo is removing what ever labels that am adding as part of declaration, this is not app-of-apps specific but it's something which is blocking me

kind: ConfigMap 
apiVersion: v1 
metadata:
  name: fabric
  namespace: argocd
  labels:
    secret-copier: yes
  annotations:
    argocd.argoproj.io/sync-wave: "-10"
data:
  ingressHost: {{ .Values.ingress.host }}
  sqlHost: {{ .Values.sql.host }}
  sqlUsername: {{ .Values.sql.username }}
  sqlPassword: {{ .Values.sql.password }}
  imageTag: "{{ .Values.imageTag }}"

Can you please let me known if all these scenarios are expected, if not plz let me know the version to which I have to fall back to so that I can resume my POC work ..

Thank you !!

@rajivml rajivml added the bug Something isn't working label Dec 27, 2020
@rajivml
Copy link
Author

rajivml commented Dec 27, 2020

looks like some of these issues are related to me setting "syncOptions: - Validate=false"

@rajivml
Copy link
Author

rajivml commented Dec 27, 2020

image

image

image

Also, app of apps, Delete just got stuck from last 15 minutes, not sure what's going on , when ever an child app goes into unknown state, this is what is happening

@rajivml
Copy link
Author

rajivml commented Dec 27, 2020

somehow am consistently running into this deletion stuck issue, I don't have automated sync enabled for any of the apps, am using the same below properties in all the Applications and am using app-of-apps to deploy all applications and am using sync-waves as well to order the applications

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: kustomize-guestbook
  namespace: argocd
  finalizers:
  - resources-finalizer.argocd.argoproj.io
  annotations:
    argocd.argoproj.io/sync-wave: "5"
spec:
  destination:
    namespace: kustomize-guestbook
    server: {{ .Values.spec.destination.server }}
  project: default
  source:
    path: kustomize-guestbook
    repoURL: {{ .Values.spec.source.repoURL }}
    targetRevision: {{ .Values.spec.source.targetRevision }}

  # Sync policy
  syncPolicy:
    automated: # automated sync by default retries failed attempts 5 times with following delays between attempts ( 5s, 10s, 20s, 40s, 80s ); retry controlled using `retry` field.
      prune: false # Specifies if resources should be pruned during auto-syncing ( false by default ).
      selfHeal: false # Specifies if partial app sync should be executed when resources are changed only in target Kubernetes cluster and no git change detected ( false by default ).
      allowEmpty: false # Allows deleting all application resources during automatic syncing ( false by default ).
    syncOptions:     # Sync options which modifies sync behavior
    - Validate=true # disables resource validation (equivalent to 'kubectl apply --validate=false') ( true by default ).
    - CreateNamespace=true # Namespace Auto-Creation ensures that namespace specified as the application destination exists in the destination cluster.
    # The retry feature is available since v1.7
    retry:
      limit: {{ .Values.spec.retry.limit }} # number of failed sync attempt retries; unlimited number of attempts if less than 0
      backoff:
        duration: {{ .Values.spec.retry.backoff.duration }} # the amount to back off. Default unit is seconds, but could also be a duration (e.g. "2m", "1h")
        factor: {{ .Values.spec.retry.backoff.factor }} # a factor to multiply the base duration after each failed retry
        maxDuration: {{ .Values.spec.retry.backoff.maxDuration }} # the maximum amount of time allowed for the backoff strategy

  # Ignore differences at the specified json pointers
  ignoreDifferences:
  - group: apps
    kind: Deployment
    jsonPointers:
    - /spec/replicas

@rajivml
Copy link
Author

rajivml commented Dec 30, 2020

one more issue, I have deleted an child application because there is a mis-configuration with the deployment, but the parent workflow instead of failing, it just got stuck in Syncing state , I think it's polling for the status of the App which has been killed/deleted.

I don't see any other option but to delete the parent workflow altogether, shouldn't it just fail as the child has failed and just be there in out of sync state, so that re-sync will fix the issue

image

@jessesuen
Copy link
Member

Haven't looked at all your specific issues, but we are aware of many known issues with the app-of-apps pattern (#4680 is relevant), which is what ApplicationSets is attempting to solve.

@rajivml
Copy link
Author

rajivml commented Jan 6, 2021

@jessesuen yes, this look like the crux of the issue and it's a complete mess especially when it comes to delete. Just now I ran into this issue again where delete got stuck from last 2 hours

I have read the spec (google design doc) related to ApplicationSets , it doesn't support sync-wave ordering at Application level and we need that to define the order in which applications can be installed one after the other in a certain order

If it's some low hanging issue, which is causing all this, can it be fixed within app-of-apps itself, because ApplicationSets am not sure when it will be production ready

@jgwest jgwest closed this as not planned Won't fix, can't repro, duplicate, stale Feb 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants