GitOps - multi-cluster-hub-spoke-argocd - App of AppSets #1908

SnowBiz · 2024-03-20T01:45:54Z

[ X ] ✋ I have searched the open/closed issues and my issue is not listed.

Please describe your question here

The current pattern used in the GitOps example uses the GitOps Bridge pattern in order to pass enablement metadata from the IaC side to the corresponding ApplicationSets. Using this metadata, the ApplicationSets can be selectively enabled and also utilize outputs from the Terraform Stack.

The current issue that I am facing, is related to resource constraints. When enabling multiple applicationSets, the applicationSets all spawn the child apps in unison. This issue is compounded when following the hub-spoke model for multi-cluster management, as 3 copies of an app are created in scenarios where I am deploying common cluster tooling. This ultimately results in instability of the application controller which then fails due to out of memory (OOM) errors. The argo application controller is unable to recover and applications fail to sync.

I would like to control the sync order of the individual applications. This would allow me to first create the metrics server when standing up a new cluster, so that when load is ramping up on the controllers the horizontal pod autoscalers (HPA) have metrics required to scale horizontally, which would allow Argo to withstand the initial load of applications.

There are various mechanisms developed to control situations like this, such as sync-waves and a new alpha feature with applicationSets, progressive-syncs. Both allow control over the sync order, sync waves can be used with a traditional "app-of-apps" pattern and progressive-syncs when using applicationSets. However, following the "app of appSets" pattern, neither of these mechanisms will work. Sync waves are not effective with applicationSets and progressive-sync would only work if all children apps were applications instead of applicationSets (and would also require them to be nested under one parent applicationSet). The downside is moving away from applicationSets in the middle tier would remove the dynamic nature of app-enablement used in the GitOps Bridge model.

Example Pattern:
addons -> example-app-applicationSet -> children apps

My question, is there a clean and effective way to control sync order of applications using this pattern? There are many scenarios where you would want one tool created prior to another, another example would be for clusters that use Fargate to host ArgoCD & Karpenter and all children apps should utilize a nodepool provided by Karpenter.

Provide a link to the example/module related to the question

https://github.com/aws-ia/terraform-aws-eks-blueprints/tree/main/patterns/gitops/multi-cluster-hub-spoke-argocd
https://github.com/aws-samples/eks-blueprints-add-ons/tree/main/argocd/bootstrap/control-plane/addons/aws

askulkarni2 · 2024-03-21T12:48:12Z

@csantanapr can you please comment here?

SnowBiz · 2024-03-21T13:37:57Z

Small update,

I have had minor success using a combination of the cluster generator, list generator, and the progressive-sync feature of an applicationSet. This is not ideal as it requires me to combine the apps into a single applicationSet if I want to create them in a specific order using progressive-sync. It also removes the ability to individually enable each addon and forces group enablement. I still need to test if I can use sync waves to ensure this appSet gets created prior to the addons appSet.

However, per the applicationSet documentation; "Progressive Syncs watch for the managed Application resources to become "Healthy" before proceeding to the next stage."

ref: https://argo-cd.readthedocs.io/en/stable/operator-manual/applicationset/Progressive-Syncs/

So sync-waves should be respected to force the order.

** This is not 100% working yet, but I am posting for reference **
(will update when I finish testing)

apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
  name: addons-core
spec:
  goTemplate: true
  generators:
  - matrix:
      generators:
      - clusters: {}
      - list:
          elements:
          - item_metadata:
              namespace: karpenter
              name: karpenter
              coreAddonType: compute
            content:   
              repoURL: 'public.ecr.aws'
              chart: 'karpenter/karpenter'
              targetRevision: 'v0.34.3'
              helm:
                releaseName: 'karpenter'
                ignoreMissingValueFiles: true
                valueFiles:
                  - $values/{{index .metadata.annotations "addons_repo_basepath"}}charts/addons/karpenter/values.yaml
                  - $values/{{index .metadata.annotations "addons_repo_basepath"}}environments/{{index .metadata.labels "environment"}}/addons/karpenter/values.yaml
                  - $values/{{index .metadata.annotations "addons_repo_basepath"}}clusters/{{.name}}/addons/karpenter/values.yaml
                values: |
                  settings:
                    aws:
                      clusterName: {{index .metadata.annotations "aws_cluster_name"}}
                      defaultInstanceProfile: {{index .metadata.annotations "karpenter_node_instance_profile_name"}}
                      interruptionQueueName: {{index .metadata.annotations "karpenter_sqs_queue_name"}}
                  serviceAccount:
                    name: {{index .metadata.annotations "karpenter_service_account"}}
                    annotations:
                      eks.amazonaws.com/role-arn: {{index .metadata.annotations "karpenter_iam_role_arn"}}
                  tolerations:
                    - key: 'eks.amazonaws.com/compute-type'
                      value: 'fargate'
                      operator: 'Equal'
                      effect: 'NoSchedule'
          - item_metadata:
              namespace: karpenter
              name: 'karpenter-resources'
              coreAddonType: resources
            content: 
              repoURL: '{{index .metadata.annotations "addons_repo_url"}}'
              path: '{{index .metadata.annotations "addons_repo_basepath"}}charts/addons/karpenter/resources'
              targetRevision: '{{index .metadata.annotations "addons_repo_revision"}}'
              helm:
                releaseName: 'karpenter-resources'
                ignoreMissingValueFiles: true
                valueFiles:
                  - $values/{{index .metadata.annotations "addons_repo_basepath"}}charts/addons/karpenter/resources/values.yaml
                  - $values/{{index .metadata.annotations "addons_repo_basepath"}}environments/{{index .metadata.labels "environment"}}/addons/karpenter/resources/values.yaml
                  - $values/{{index .metadata.annotations "addons_repo_basepath"}}clusters/{{.name}}/addons/karpenter/resources/values.yaml
                values: |
                  metadata:
                    cluster_name: {{index .metadata.annotations "aws_cluster_name"}}
                    cluster_kms_key_arn: {{index .metadata.annotations "cluster_kms_key_arn"}}
                    cluster_environment: {{index .metadata.labels "environment"}}          
          - item_metadata:
              namespace: 'kube-system'
              name: 'metrics-server'
              coreAddonType: metrics
            content:  
              repoURL: 'https://kubernetes-sigs.github.io/metrics-server'
              chart: 'metrics-server'
              targetRevision: '3.11.0'
              helm:
                releaseName: 'metrics-server'
                ignoreMissingValueFiles: true
                valueFiles:
                  - $values/{{index .metadata.annotations "addons_repo_basepath"}}charts/addons/metrics-server/values.yaml
                  - $values/{{index .metadata.annotations "addons_repo_basepath"}}environments/{{index .metadata.labels "environment"}}/addons/metrics-server/values.yaml
                  - $values/{{index .metadata.annotations "addons_repo_basepath"}}clusters/{{.name}}/addons/metrics-server/values.yaml
  strategy:
    type: RollingSync
    rollingSync:
      steps:
        - matchExpressions:
            - key: coreAddonType
              operator: In
              values:
                - compute
          maxUpdate: 25%
        - matchExpressions:
            - key: coreAddonType
              operator: In
              values:
                - metrics
          maxUpdate: 25%
        - matchExpressions:
            - key: coreAddonType
              operator: In
              values:
                - resources
          maxUpdate: 25%
  template:
    metadata:
      name: '{{.name}}-{{.item_metadata.name}}'
      labels:
        coreAddonType: '{{.item_metadata.coreAddonType}}'
    spec:
      project: default
      sources:
        - repoURL: '{{.content.repoURL}}'
          targetRevision: '{{.content.targetRevision}}'
          helm:
            releaseName: '{{.content.releaseName}}'
            ignoreMissingValueFiles: true
            values: |
              {{.content.helm.values}}
      syncPolicy:
        automated:
          prune: true
      destination:
        name: '{{.name}}'
        namespace: '{{.item_metadata.namespace}}'
  templatePatch: |
    spec:
      sources:
        - repoURL: '{{index .metadata.annotations "addons_repo_url"}}'
          targetRevision: '{{index .metadata.annotations "addons_repo_revision"}}'
          ref: values
        - repoURL: '{{.content.repoURL}}'
          {{if .content.chart}}
          chart: '{{.content.chart}}'
          {{end}}
          {{if .content.path}}
          path: '{{.content.path}}'
          {{end}}
          targetRevision: '{{.content.targetRevision}}'
          helm:
            releaseName: '{{.content.helm.releaseName}}'
            ignoreMissingValueFiles: true
            valueFiles:
            {{- range $valueFile := .content.helm.valueFiles }}
              - {{ $valueFile }}
            {{- end }}
            values: {{ .content.helm.values | toYaml | indent 20}}
    {{- if .autoSync }}
      syncPolicy:
        automated:
          prune: {{ .prune }}
    {{- end }}

SnowBiz · 2024-03-30T18:28:17Z

Any thoughts on this? I believe that a mechanism to control deployment order of addons is critical for this pattern to be used in a production environment. I've made more progress with the core-addons example I've posted above, which I can share later next week after returning from holiday. However, I don't believe this is ideal as it requires grouping of apps and treats them as a package instead of individual addons.

SnowBiz · 2024-04-13T00:29:18Z

I abandoned trying to group apps in a single applicationSet in order to control sync order. It felt messy, and had negative trade-offs in my opinion.

@csantanapr brought up a valid (and a much preferred) solution on another thread, mentioned in the following issue;

argoproj/argo-cd#12444 (comment)

csantanapr · 2024-04-13T00:44:11Z

@SnowBiz I have been discussing ordering of addons in gitops-bridge project in CNCF slack with other argocd community members like Jan and Christian.
Your welcome to join the channel #argocd-gitops-bridge

For single/standalone cluster sync waves of applicationset they get deploy in order but the controller only waits 2 seconds between them. I shared the solution (ie increase to 30 seconds) and github repo I have been using to do experiments.

If we implement health status in ApplicationSets we can remove the artificial large timeout of 30 seconds

For multi-cluster hub-spoke, your spot on that we need the ordering between apps/addons from different applicationsets with the current layout, or do what you did with deploying different addons from a single applicationset and using progressive sync.

The idea solution Jan is working on is to implement a depenency graph and implement dependsOn for ArgoCD apps (flux has similar dependsOn) argoproj/argo-cd#15280 So I have being holding off doing more workarounds and waiting for this to land

github-actions · 2024-05-14T00:10:26Z

This issue has been automatically marked as stale because it has been open 30 days
with no activity. Remove stale label or comment or this issue will be closed in 10 days

github-actions · 2024-05-24T00:10:54Z

Issue closed due to inactivity.

github-actions bot added the stale label May 14, 2024

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale May 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GitOps - multi-cluster-hub-spoke-argocd - App of AppSets #1908

GitOps - multi-cluster-hub-spoke-argocd - App of AppSets #1908

SnowBiz commented Mar 20, 2024 •

edited

Loading

askulkarni2 commented Mar 21, 2024

SnowBiz commented Mar 21, 2024 •

edited

Loading

SnowBiz commented Mar 30, 2024

SnowBiz commented Apr 13, 2024 •

edited

Loading

csantanapr commented Apr 13, 2024 •

edited

Loading

github-actions bot commented May 14, 2024

github-actions bot commented May 24, 2024

GitOps - multi-cluster-hub-spoke-argocd - App of AppSets #1908

GitOps - multi-cluster-hub-spoke-argocd - App of AppSets #1908

Comments

SnowBiz commented Mar 20, 2024 • edited Loading

Please describe your question here

Provide a link to the example/module related to the question

askulkarni2 commented Mar 21, 2024

SnowBiz commented Mar 21, 2024 • edited Loading

SnowBiz commented Mar 30, 2024

SnowBiz commented Apr 13, 2024 • edited Loading

csantanapr commented Apr 13, 2024 • edited Loading

github-actions bot commented May 14, 2024

github-actions bot commented May 24, 2024

SnowBiz commented Mar 20, 2024 •

edited

Loading

SnowBiz commented Mar 21, 2024 •

edited

Loading

SnowBiz commented Apr 13, 2024 •

edited

Loading

csantanapr commented Apr 13, 2024 •

edited

Loading