-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GitOps - multi-cluster-hub-spoke-argocd - App of AppSets #1908
Comments
@csantanapr can you please comment here? |
Small update, I have had minor success using a combination of the cluster generator, list generator, and the progressive-sync feature of an applicationSet. This is not ideal as it requires me to combine the apps into a single applicationSet if I want to create them in a specific order using progressive-sync. It also removes the ability to individually enable each addon and forces group enablement. I still need to test if I can use sync waves to ensure this appSet gets created prior to the addons appSet. However, per the applicationSet documentation; "Progressive Syncs watch for the managed Application resources to become "Healthy" before proceeding to the next stage." ref: https://argo-cd.readthedocs.io/en/stable/operator-manual/applicationset/Progressive-Syncs/ So sync-waves should be respected to force the order. ** This is not 100% working yet, but I am posting for reference ** apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
name: addons-core
spec:
goTemplate: true
generators:
- matrix:
generators:
- clusters: {}
- list:
elements:
- item_metadata:
namespace: karpenter
name: karpenter
coreAddonType: compute
content:
repoURL: 'public.ecr.aws'
chart: 'karpenter/karpenter'
targetRevision: 'v0.34.3'
helm:
releaseName: 'karpenter'
ignoreMissingValueFiles: true
valueFiles:
- $values/{{index .metadata.annotations "addons_repo_basepath"}}charts/addons/karpenter/values.yaml
- $values/{{index .metadata.annotations "addons_repo_basepath"}}environments/{{index .metadata.labels "environment"}}/addons/karpenter/values.yaml
- $values/{{index .metadata.annotations "addons_repo_basepath"}}clusters/{{.name}}/addons/karpenter/values.yaml
values: |
settings:
aws:
clusterName: {{index .metadata.annotations "aws_cluster_name"}}
defaultInstanceProfile: {{index .metadata.annotations "karpenter_node_instance_profile_name"}}
interruptionQueueName: {{index .metadata.annotations "karpenter_sqs_queue_name"}}
serviceAccount:
name: {{index .metadata.annotations "karpenter_service_account"}}
annotations:
eks.amazonaws.com/role-arn: {{index .metadata.annotations "karpenter_iam_role_arn"}}
tolerations:
- key: 'eks.amazonaws.com/compute-type'
value: 'fargate'
operator: 'Equal'
effect: 'NoSchedule'
- item_metadata:
namespace: karpenter
name: 'karpenter-resources'
coreAddonType: resources
content:
repoURL: '{{index .metadata.annotations "addons_repo_url"}}'
path: '{{index .metadata.annotations "addons_repo_basepath"}}charts/addons/karpenter/resources'
targetRevision: '{{index .metadata.annotations "addons_repo_revision"}}'
helm:
releaseName: 'karpenter-resources'
ignoreMissingValueFiles: true
valueFiles:
- $values/{{index .metadata.annotations "addons_repo_basepath"}}charts/addons/karpenter/resources/values.yaml
- $values/{{index .metadata.annotations "addons_repo_basepath"}}environments/{{index .metadata.labels "environment"}}/addons/karpenter/resources/values.yaml
- $values/{{index .metadata.annotations "addons_repo_basepath"}}clusters/{{.name}}/addons/karpenter/resources/values.yaml
values: |
metadata:
cluster_name: {{index .metadata.annotations "aws_cluster_name"}}
cluster_kms_key_arn: {{index .metadata.annotations "cluster_kms_key_arn"}}
cluster_environment: {{index .metadata.labels "environment"}}
- item_metadata:
namespace: 'kube-system'
name: 'metrics-server'
coreAddonType: metrics
content:
repoURL: 'https://kubernetes-sigs.github.io/metrics-server'
chart: 'metrics-server'
targetRevision: '3.11.0'
helm:
releaseName: 'metrics-server'
ignoreMissingValueFiles: true
valueFiles:
- $values/{{index .metadata.annotations "addons_repo_basepath"}}charts/addons/metrics-server/values.yaml
- $values/{{index .metadata.annotations "addons_repo_basepath"}}environments/{{index .metadata.labels "environment"}}/addons/metrics-server/values.yaml
- $values/{{index .metadata.annotations "addons_repo_basepath"}}clusters/{{.name}}/addons/metrics-server/values.yaml
strategy:
type: RollingSync
rollingSync:
steps:
- matchExpressions:
- key: coreAddonType
operator: In
values:
- compute
maxUpdate: 25%
- matchExpressions:
- key: coreAddonType
operator: In
values:
- metrics
maxUpdate: 25%
- matchExpressions:
- key: coreAddonType
operator: In
values:
- resources
maxUpdate: 25%
template:
metadata:
name: '{{.name}}-{{.item_metadata.name}}'
labels:
coreAddonType: '{{.item_metadata.coreAddonType}}'
spec:
project: default
sources:
- repoURL: '{{.content.repoURL}}'
targetRevision: '{{.content.targetRevision}}'
helm:
releaseName: '{{.content.releaseName}}'
ignoreMissingValueFiles: true
values: |
{{.content.helm.values}}
syncPolicy:
automated:
prune: true
destination:
name: '{{.name}}'
namespace: '{{.item_metadata.namespace}}'
templatePatch: |
spec:
sources:
- repoURL: '{{index .metadata.annotations "addons_repo_url"}}'
targetRevision: '{{index .metadata.annotations "addons_repo_revision"}}'
ref: values
- repoURL: '{{.content.repoURL}}'
{{if .content.chart}}
chart: '{{.content.chart}}'
{{end}}
{{if .content.path}}
path: '{{.content.path}}'
{{end}}
targetRevision: '{{.content.targetRevision}}'
helm:
releaseName: '{{.content.helm.releaseName}}'
ignoreMissingValueFiles: true
valueFiles:
{{- range $valueFile := .content.helm.valueFiles }}
- {{ $valueFile }}
{{- end }}
values: {{ .content.helm.values | toYaml | indent 20}}
{{- if .autoSync }}
syncPolicy:
automated:
prune: {{ .prune }}
{{- end }} |
Any thoughts on this? I believe that a mechanism to control deployment order of addons is critical for this pattern to be used in a production environment. I've made more progress with the core-addons example I've posted above, which I can share later next week after returning from holiday. However, I don't believe this is ideal as it requires grouping of apps and treats them as a package instead of individual addons. |
I abandoned trying to group apps in a single applicationSet in order to control sync order. It felt messy, and had negative trade-offs in my opinion. @csantanapr brought up a valid (and a much preferred) solution on another thread, mentioned in the following issue; |
@SnowBiz I have been discussing ordering of addons in gitops-bridge project in CNCF slack with other argocd community members like Jan and Christian. For single/standalone cluster sync waves of applicationset they get deploy in order but the controller only waits 2 seconds between them. I shared the solution (ie increase to 30 seconds) and github repo I have been using to do experiments. If we implement health status in ApplicationSets we can remove the artificial large timeout of 30 seconds For multi-cluster hub-spoke, your spot on that we need the ordering between apps/addons from different applicationsets with the current layout, or do what you did with deploying different addons from a single applicationset and using progressive sync. The idea solution Jan is working on is to implement a depenency graph and implement |
This issue has been automatically marked as stale because it has been open 30 days |
Issue closed due to inactivity. |
Please describe your question here
The current pattern used in the GitOps example uses the GitOps Bridge pattern in order to pass enablement metadata from the IaC side to the corresponding ApplicationSets. Using this metadata, the ApplicationSets can be selectively enabled and also utilize outputs from the Terraform Stack.
The current issue that I am facing, is related to resource constraints. When enabling multiple applicationSets, the applicationSets all spawn the child apps in unison. This issue is compounded when following the hub-spoke model for multi-cluster management, as 3 copies of an app are created in scenarios where I am deploying common cluster tooling. This ultimately results in instability of the application controller which then fails due to out of memory (OOM) errors. The argo application controller is unable to recover and applications fail to sync.
I would like to control the sync order of the individual applications. This would allow me to first create the metrics server when standing up a new cluster, so that when load is ramping up on the controllers the horizontal pod autoscalers (HPA) have metrics required to scale horizontally, which would allow Argo to withstand the initial load of applications.
There are various mechanisms developed to control situations like this, such as sync-waves and a new alpha feature with applicationSets, progressive-syncs. Both allow control over the sync order, sync waves can be used with a traditional "app-of-apps" pattern and progressive-syncs when using applicationSets. However, following the "app of appSets" pattern, neither of these mechanisms will work. Sync waves are not effective with applicationSets and progressive-sync would only work if all children apps were applications instead of applicationSets (and would also require them to be nested under one parent applicationSet). The downside is moving away from applicationSets in the middle tier would remove the dynamic nature of app-enablement used in the GitOps Bridge model.
Example Pattern:
addons -> example-app-applicationSet -> children apps
My question, is there a clean and effective way to control sync order of applications using this pattern? There are many scenarios where you would want one tool created prior to another, another example would be for clusters that use Fargate to host ArgoCD & Karpenter and all children apps should utilize a nodepool provided by Karpenter.
Provide a link to the example/module related to the question
https://github.com/aws-ia/terraform-aws-eks-blueprints/tree/main/patterns/gitops/multi-cluster-hub-spoke-argocd
https://github.com/aws-samples/eks-blueprints-add-ons/tree/main/argocd/bootstrap/control-plane/addons/aws
The text was updated successfully, but these errors were encountered: